Linearization of optimal rates for independent zero-error source and channel problems
Nicolas Charpenay,
,
Maël Le Treust,
,
and
Aline Roumy
The work of Nicolas Charpenay was conducted during his PhD at IRISA UMR 6074 and Centre Inria de l’Université de Rennes, funded by CDSN ENS Paris-Saclay.This work was presented in part at the IEEE Information Theory Workshop (ITW) 2023 in Saint-Malo, France, [DOI: 10.1109/ITW55543.2023.10161637], and in part at the IEEE International Symposium on Information Theory (ISIT) 2023 in Taipei, Taiwan, [DOI: 10.1109/ISIT54713.2023.10206770].Nicolas Charpenay is with Univ Rennes, CNRS, IRMAR UMR 6625, F-35000 Rennes, France (e-mail: [email protected]).Maël Le Treust is with Univ. Rennes, CNRS, Inria, IRISA UMR 6074, F-35000 Rennes, France (e-mail: [email protected]).Aline Roumy is with Centre Inria de l’Université de Rennes, France (e-mail: [email protected]).
Abstract
Zero-error coding encompasses a variety of source and channel problems where the probability of error must be exactly zero. The zero-error constraint differs from the vanishing-error constraint, the latter only requires the probability of error to go to zero when the block length of the code goes to infinity. Here, many problems change from a statistical nature to a combinatorial one, which is tied to the encoder?s lack of knowledge about what is observed by the decoder. In this paper, we investigate two unsolved zero-error problems: the source coding with side information and the channel coding. We focus our attention on families of independent problems for which the distribution decomposes into a product of distributions, corresponding to solved zero-error problems. A crucial step is the linearization property of the optimal rate, which does not always hold in the zero-error regime, unlike in the vanishing error regime. By generalizing recent results of Wigderson and Zuiddam, and of Schrijver, we derive a condition under which the linearization properties of the complementary graph entropy and of the zero-error capacity for the AND product of graph and for the disjoint union of graphs are all equivalent. This provides new single-letter characterization of and , for example when the graph is a product of perfect graphs, which is not perfect in general, and for the class of graphs obtain by the product of a perfect graph with the pentagon graph . By building on Haemers result, we also show that the linearization of the complementary graph entropy does not hold for the product of the Schläfli graph with its complementary graph.
I Introduction
Transmitting information without any errors has been a concern for Shannon since the beginning of his work. In his seminal paper [1], Shannon proposed a construction for zero-error source coding, a problem soon solved by Huffman in [2]. Shortly after establishing the channel capacity in [1], Shannon turned his attention to channel coding with zero-error in [3], instead of vanishing error. This subtle difference radically changes the nature of the problem, essentially combinatorial rather than probabilistic. The single-letter characterization of the zero-error capacity is a notoriously difficult open problem. For example, the zero-error capacity of the noisy-typewriter channel with letters
is unknown, some lower and upper bounds are stated in [4], [5], and [6]. In fact, the zero-error property only depends on the support of the channel conditional distribution but not on the probability values. More precisely, the zero-error property is translated into the characteristic graph that encompasses the problem data in its structure: the vertices are the channel inputs , and two symbols and are adjacent if they are “confusable”, i.e. if they can produce the same channel output with positive probability. For sequences of symbols , the characteristic graph is obtained by taking iteratively the AND product () of the graphs. In order to prevent any decoding error, a zero-error codebook must be composed of non-adjacent codewords. Thus, the size of the optimal codebook is given by the size of the maximal independent set, called the independence number. More specifically, the zero-error capacity is the asymptotic limit of the independence number of iterated AND products of the characteristic graph. This means that all channel distributions with the same characteristic graph (or equivalently, with the same support) have the same zero-error capacity. Over time, this open question has attracted a lot of attention in Information Theory [7, Chap. 11] and in Combinatorics and Graph Theory, see [8, Chap. 27].
Figure 1: The characteristic graph of the noisy-typewriter channel with letters.
This problem inspired Berge’s notion of perfect graphs [9, pp. 382], for which the zero-error capacity is given by the one-shot independence number [10, Theorem 4.18]. Graphs with odd cycles are also related to Berge’s conjecture [11], later proved in [12] by Chudnovsky et al., namely “a graph is perfect if and only if neither nor its complementary graph , have odd cycles of length or more.” Since the zero-error capacity of the pentagon graph has been characterized by Shannon [3] and Lovász [13],
as well as the zero-error capacity for perfect graphs, see [10, Theorem 4.18], the graph depicted in Fig. 1, is the minimal connected graph for which the zero-error capacity is still an open problem.
In the source coding framework, an important unsolved zero-error problem was posed by Witsenhausen in [14] when the decoder has side information. In this problem depicted in Fig. 2, the encoder shares information about a source , exploiting the side-information observed by the decoder but not by itself. In the vanishing error regime, Slepian and Wolf in [15] showed that the optimal rate is , but no single-letter characterization is available in the zero-error regime. As for the channel coding problem, the zero-error property is embedded into the characteristic graph constructed with respect to the conditional distribution of the “side-information channel”. The main difference with channel coding problem is that the source distribution is fixed. Witsenhausen showed in [14] that the fixed-length encoding task becomes equivalent to graph coloring. In [16], Alon and Orlitsky considered the variable-length version, determining an asymptotic expression for the optimal rate based on the chromatic entropy. In [17], Koulgi et al. proved that the optimal rate coincides with the complementary graph entropy , defined by Körner and Longo in [18]. These two expressions are asymptotic, as optimal rates are determined by coloring an infinite product of graphs. Single-letter characterizations are known only in the same cases as for zero-error capacity, such as perfect graphs and the pentagon graph . Instead, the Körner graph entropy [19] provides a single-letter expression for the unrestricted input setting of Alon and Orlitsky in [16], where the zero-error constraint is satisfied even outside the source’s support, providing an upper bound on the optimal rate.
Figure 2: The source coding problem with decoder side information, called side-information problem.
The difficulty of the zero-error problem comes from the fact that the knowledge of the decoder is not included in the one of the encoder. This is because the side information is available only at the decoder. Indeed, giving the side information also to the encoder allows to implement a zero-error conditional Huffman code [2] of rate , as in the vanishing error regime [15]. The asymmetry of knowledge poses the same difficulty in zero-error channel coding problem. Note that this difficulty is mitigated when the encoder observes the past channel output symbols, leading to the characterization by Shannon, of the zero-error capacity with feedback in [3].
In this paper we focus our attention on zero-error problems composed of a family of independent problems. First, we consider that the source and side-information decompose into a family of independent random variables . This induces a characteristic graph with a specific structure given by the AND product of the graphs of each subproblems . Such decomposition has two benefits: it reduces the problem complexity and it provides new cases for which the complementary graph entropy is charaterized. In the vanishing error regime, the optimal rate is equal to the sum of the optimal rates in each subproblem, we say that the optimal rate linearizes. By building on Haemers results for the zero-error capacity of the Schläfli graph in [20], we show that independence alone doesn’t ensure the linearization of the complementary graph entropy, which contradicts a standard property in the vanishing error regime. Thus, showing the linearization of the optimal rates becomes crucial.
Our first contribution is to show that the linearization of the complementary graph entropy holds for the AND product of graphs if and only if it holds for the disjoint union of graphs (), also called “sum of graphs” in [21]. Recently, Wigderson and Zuiddam in [22] and Schrijver in [23] show a similar statement for the zero-error capacity: the linearization holds for the AND product if and only if it holds for the disjoint union. We then explore the consequences of these two statements, for which the characteristic graphs are defined similarly. An important difference is about the probability distribution which is specified in the source problem but a priori unspecified in the zero-error channel coding problem.
A natural notion related to both and is the zero-error capacity of a graph relative to a distribution introduced by Csiszár and Körner in [24]. By taking the maximum over the distribution , Gargano et al. [25] showed that it is equal to the zero-error capacity
(1)
Moreover, Marton showed in [26] that the complementary graph entropy satisfies
(2)
Equations (1) and (2) are the analogues of the channel capacity and the entropy property in the vanishing error regime. The main contribution of the paper is to show that the linearization properties of , and for the AND product and for the disjoint union are all equivalent, provided that the source distribution maximizes when evaluated with respect to the AND product of graphs.
These linearization properties enlarge the class of problems for which and have a single-letter characterization. For perfect graphs, we show that the linearizations of and always hold for the AND product and for the disjoint union. As a consequence, we determine new single-letter characterizations for the products of perfect graphs that are not perfect in general, and for the product of a perfect graph with the pentagon graph , by building on the characterization of stated in [21].
A crucial notion is the set of capacity-achieving distributions that contains all the distributions for which . We show that the uniform distribution is capacity-achieving when the graph is vertex-transitive, i.e. when all vertices play the same role within the graph. Since, the Schlälfi graph and its complementary graph are vertex-transitive, so as their product , the uniform distributions are capacity achieving for , and . Together with Haemers result [20], this shows a counterexample where linearizations of , and for the AND product and the disjoint union of and do not hold.
In Sec. II, we study the linearization of the complementary graph entropy for the source problem with side information. The connection with the linearization of the zero-error capacity is investigated in Sec. III. New single-letter characterizations for and are provided in Sec. IV, as well as the counter-example of linearization based on the Schläfli graph.
II Zero-Error Source Coding With Decoder Side Information
II-AProblem Statement and Results from the Literature
The zero-error source coding problem with decoder side information is depicted in Fig. 2. It corresponds to a situation in data compression where the decoder has side-information about the source that has to be retrieved. This problem was formulated by Slepian and Wolf in [15] in the vanishing error regime and by Witsenhausen in [14] for the zero-error variant. We call this the side-information problem.
More formally, we assume that a sequence i.i.d. random variables of length is drawn according to where and are finite sets.
We consider variable-length source coding, which encompasses the special case of fixed-length source coding. An variable-length side-information source code for and consists of
-
an encoder that assigns to each a binary string such that is prefix-free,
-
a decoder that assigns an estimate to each pair .
The rate of the -code is the average length of the codeword per source symbol, i.e. , and the probability of error is .
Definition 1.
The optimal rate in the vanishing error regime is the minimal rate among all codes that satisfy the -error constraint with :
(3)
The optimal rate in the zero-error regime is the minimal rate among all coding schemes that satisfy the zero-error constraint:
(4)
When the side-information is available at both encoder and decoder, the optimal rates in the vanishing and zero-error regimes are equal to . The zero-error coding construction relies on a conditional Huffman coding [2]. In the side-information problem, the encoder does not observes the side-information .
This asymmetry of information has a consequence: the optimal rates in the vanishing and zero-error regimes are distinct.
The nature of the problem changes when considering an error probability equal to zero, instead of a vanishing error probability. In the zero-error regime, the characterisation of the optimal rate is a notoriously difficult open problem of combinatorial nature. The key features of the side-information problem are captured by the “characteristic graph” introduced by Witsenhausen in [14], which we review below.
Definition 2(Characteristic graph).
Let be two finite sets and be a conditional distribution. The characteristic graph associated to is defined by:
-
as set of vertices,
-
are adjacent , if for some .
A characteristic graph is a probabilistic graph , when it has the underlying distribution on its vertices.
The meaning of the characteristic graph is that, when the side information does not allow to distinguish exactly between the source realizations and , then and are adjacent, and must be mapped to different codewords. Therefore, a zero-error encoding is a graph coloring for which adjacent vertices are mapped to different colors.
Definition 3(Coloring, chromatic number ).
Let be a graph. A map** is a coloring if for all adjacent vertices , with , we have . The chromatic number is the smallest such that there exists a coloring of .
For sequences of symbols with underlying distribution , two sequences of source inputs are adjacent in the graph if for some sequence of channel outputs , i.e. if and only if either or . This implies that for sequences of symbols, the characteristic graph is built by using the AND product of graphs, denoted by , and also called “strong product” or “normal product” in [27, 26], and defined below.
Definition 4(AND product ).
Let , be two probabilistic graphs, their AND product is a probabilistic graph defined by:
-
as set of vertices,
-
are adjacent if
and ,
with the convention of self-adjacency for all vertices.
-
as probability distribution on the vertices.
We denote by the -th AND power:
times.
Unlike the vanishing error regime, there is no single-letter characterization of the optimal rate in the zero-error regime. We present two different asymptotic expressions which rely on codebooks composed of codewords that form a coloring of the AND product of the characteristic graph .
Alon and Orlitsky introduced an asymptotic expression in [16], for the optimal rate in the “restricted inputs” setting. The optimal rate relies on the notion of chromatic entropy , which is the minimal entropy of a coloring of .
Definition 5(Chromatic entropy ).
The chromatic entropy of the probabilistic graph is defined by
Even though there is no single-letter expression for , Alon and Orlitsky provided a single-letter upper bound in [16] by adding the constraint called “unrestricted inputs”. This constraint requires the zero-error property to be satisfied even for the sequences of symbols that take values out of the support of .
Figure 3: The pentagon graphs with uniform distribution over the vertices.
With high probability, the source sequence is typical with respect to . Let be the subgraph of induced by the set of typical sequences with tolerance , see [7, Definition 2.8]. The zero-error code consists of a coloring of this induced subgraph with a minimum number of colors , where denotes the chromatic number of the graph. The encoder sends the color index to the decoder if is typical, otherwise it sends the index of the sequence in . This coding strategy has a rate upper-bounded by
(8)
The zero-error property is satisfied since the decoder is able to retrieve thanks to and the color symbol. Koulgi et al. have shown in [17, Theorem 1] that taking the limit when goes to infinity and goes to yields the best achievable rate in the zero-error side-information problem. This quantity, introduced by Körner and Longo in [18], is called the complementary graph entropy.
Definition 6.
For all probabilistic graph , the complementary graph entropy is defined by:
where is the probabilistic graph formed of the characteristic graph associated to the distribution , with the underlying distribution on its vertices.
A trivial single-letter upper bound is given by where the zero-error coding construction relies on Huffman coding and the decoder ignores the side information . In fact, this upper bound is tight for a dense subset of distributions in .
Since the distribution has full support, the characteristic graph is complete, i.e. every pair of symbols , are adjacent in , thus , which concludes the proof of Prop. 1.
∎
There are a few other cases where the optimal zero-error rate is known such as perfect graphs, or the pentagon with uniform distribution shown in Fig. 3 where . In general, the single-letter characterization of remains a difficult open question.
In order to understand the problem’s difficulty, we examine a specific scenario where the source and side information decompose into independent variables. In the vanishing error regime, independence is a key assumption that induces the linearization of optimal rates, shedding light on practical coding techniques. In the zero-error regime, the independence hypothesis alone is insufficient for linearization of optimal rates. We specify a hypothesis that ensures linearization, enabling us to enlarge the set of problems for which the optimal rate has a single-letter characterization.
Figure 4: Independent side-information problems
More formally, for a finite set , we assume a set of pairs , referred to as an independent family, that consists of pairs with joint distribution that decomposes as a product of distributions. This independent family generates sequences of i.i.d. random variables,
(11)
Independent side-information problems correspond to a side-information problem in which the source and the side information are an independent family, as shown in Fig. 4. In the vanishing error regime, the optimal rate linearizes:
(12)
This property is fundamental because it means that the construction of an optimal codebook results from the concatenation of the codewords of the optimal codebooks for each subproblem.
But does linearization also hold in the zero-error regime for independent side-information problems? To answer this question, we first derive an asymptotic expression for the optimal zero-error rate. This derivation follows from the fact that the independent family can be characterized by a product of graphs.
Proposition 2.
Let be an independent family. The optimal rate for the independent zero-error side-information problems is
(13)
where for all , is the characteristic graph associated to the conditional distribution , with the underlying probability distribution on its vertices.
It is known that the complementary graph entropy is sublinear with respect to the AND product. Indeed, [21, Theorem 2] states that for all probabilistic graphs and
(14)
However, does not linearize in general. Inspired by Haemers result [20], we show in Theorem 19, that the inequality (14) is strict for the Schläfli graph and its complement .
In the following, we study a condition that allows for the linearization of , i.e. where (14) holds with equality.
To do this, we introduce the disjoint union of graphs, also called “sum of graphs” in [21].
Definition 7(Disjoint union of probabilistic graphs ).
Let be a finite set, let , and let be probabilistic graphs, for all . The disjoint union with respect to is a probabilistic graph denoted by and defined by:
-
is the disjoint union of the sets ;
-
For all , if and only if they both belong to the same and ;
-
, note that the have disjoint support in .
The disjoint union of graphs without probability distribution has the vertex set and edges defined above, without underlying probability distribution.
An example of an AND product and a disjoint union of probabilistc graphs is shown in Fig. 5. Note that, as with the AND product, the complementary graph entropy is sublinear with respect to the disjoint union. Indeed, [21, Theorem 2] states that for all probabilistic graphs and and ,
(15)
Figure 5: An empty graph and a complete graph , along with their AND product and their disjoint union with respect to . The underlying distributions are represented by the numbers on each vertex.
We present our first main result, which is that the linearization of the complementary graph entropy with respect to the AND product holds if and only if the linearization holds with respect to the disjoint union. An important consequence of this linearization property is that the concatenation of optimal codes for the subproblems is optimal.
Theorem 4(Equivalence of the linearization of and ).
Let be a finite set, a distribution with full-support, and let a family of probabilistic graphs. The following equivalence holds:
(16)
(17)
We say that the linearization property holds if (16) or (17) are satisfied.
An important consequence of this theorem is that it allows the single-letter characterization of optimal rates for new sources, as discussed in Sec. IV. For example, it characterizes the optimal rate for the product of perfect graphs, which is not perfect in general. This is a consequence of the fact that the disjoint product of perfect graphs is perfect, therefore the optimal rates linearize, and from our theorem, the optimal rate for the AND product of perfect graphs is equal to the sum of the rates of each graph.
Without loss of generality, we consider that is the support of . We observe that (16) does not depend on the distribution , therefore if (17) holds for a distribution with full support, then it holds for all distributions with full support. This remark, along with the continuity of the function stated in Lemma 2 below, allows us to state the following corollary.
Corollary 1.
If the linearization properties (16) or (17) hold for a family of probabilistic graphs , then (16) and (17) also hold for any subfamily of probabilistic graphs with .
The proof of Theorem 4 is stated in App. A. It relies on Lemma 1 which follows from the definition of that involves the AND power of graph, and from the distributivity of with respect to .
Lemma 1.
(18)
According to Lemma 1, the complementary graph entropies of the AND product and of the disjoint union of graphs are proportional when is the uniform distribution. This shows the equivalence between (17) and (16) when is uniform. In order to extend this argument to all distributions , we need to show the convexity property of the function
(19)
Lemma 2.
The function is convex and -Lipschitz.
The proof of Lemma 2 is stated in App. A-B. Assume that the linearization (17) holds for the uniform distribution. The convexity property implies that the function linearizes for all distributions . This argument, together with the result of Lemma 1, shows the proof of Theorem 4.
An important precision is that the complementary graph entropies of the AND product and of the disjoint union of graphs remains proportional on a dense subset of .
Lemma 3.
Let be a type of a sequence of length . We have,
(20)
The proof of Lemma 3 is stated in App. A-A, note that Lemma 1 is a special case of Lemma 3. The equivalence between (16) and (17) is satisfied for all type of sequence of length , moreover the function is continuous on , so this argument also demonstrates the Theorem 4.
II-DZero-error source coding with side information at the decoder and partial side information at the encoder
In Sec. II-B, we highlighted the significance of the disjoint union of graphs. Indeed, using Theorem 4, we can show the linearization for the disjoint union in order to conclude on the linearization of the AND product. This is simpler due to the fact that the disjoint union has fewer vertices and edges than the AND product. Yet, the usefulness extends further as the disjoint union corresponds to the problem in Fig. 6, where the encoder has partial information about the decoder’s side information, obtained through the deterministic function . This setting in Fig. 6 is a specific case of Fig. 2, equivalent to a source that the decoder must retrieve. We refer to this as the partial-side-information problem.
Figure 6: The partial-side-information problem.
Proposition 3.
When the encoder has partial side information, the optimal rate writes
(21)
where for all , is the characteristic graph associated to the conditional distribution
with the underlying probability distribution on its vertices. These conditional distributions and are obtained from the joint distribution depending on the deterministic function .
Indeed, for each realization of the encoder side information , we construct a characteristic graph to model the sub-problem indexed by . Since, both encoder and decoder have access to , the characteristic graph consists in the disjoint union the graphs . Moreover, each contains all realizations , and there is an edge between two vertices if and only if for some .
If the disjoint union linearizes, it reveals that it is optimal to implement the following coding scheme:
•
For each symbol , select the indices of the sequence such that . Denote by the corresponding subsequences extracted from of length .
•
For each , use the optimal codebook for the independent sources with distinct length of sequences and concatenate the codewords obtained.
With high probability, the sequence belongs to the set of typical sequences , therefore the empirical distribution converges to in probability. The coding rate of the above scheme converges to
(22)
which is optimal for the partial-side-information problem.
III Zero-Error Channel Coding Problem
Recently, Wigderson and Zuiddam in [22] and Schrijver in [23] establish the equivalence between the linearization of the zero-error channel capacity of the AND product and of the disjoint union of graphs . The main difference with the side information problem is that the channel input distribution is a priory not specified in the zero-error channel coding problem.
Figure 7: Equivalences of linearization properties between the zero-error capacity , the zero-error capacity relative to a distribution , and the complementary graph entropy . Our contributions are represented in the dashed rectangles.
In this section, we connect these two zero-error problems by exploring the properties of the zero-error capacity of a graph relative to a distribution , introduced by Csiszár and Körner [24]. We show the equivalence of the linearization properties of [22, Theorem 4.1] and [23, Theorem 2], and of the linearization properties of Theorem 4, provided that the source distribution maximizes the zero-error capacity relative to a distribution of the AND product of graphs. The path taken to demonstrate these equivalences is shown in Fig. 7.
III-AZero-error channel capacity
The channel coding problem in Fig. 8 is introduced in [1] in the vanishing error regime, and in [3] in the zero-error regime. We consider a Discrete Memoryless Channel (DMC) that consists of an input alphabet , a finite output alphabet and a conditional distribution (a.k.a. transition probability) . A -code consists of
-
an encoder that selects uniformly a codeword from the codebook , and sends it over the DMC,
-
a decoder that assigns an estimate to each received .
The rate of the -code is the average number of messages transmitted per channel use, i.e. , and the probability of error is .
Figure 8: The channel coding problem.
Definition 8.
The channel capacity is the maximal rate among all codes that satisfy the -error property, with :
(23)
The zero-error channel capacity is the maximal rate among all coding schemes that satisfy the zero-error property:
Let be the characteristic graph corresponding to the DMC . The zero-error channel capacity satisfies
(26)
Remark 1.
Note that, by convention, we define the zero-error capacity with the logarithm. Another existing convention (for example in [27]) for the zero-error capacity is , which is equivalent in the sense that .
We present in Sec. IV-A some examples from the literature where is known, in particular when is a perfect graph. The Lovász function, introduced in [27], is an upper bound on the zero-error capacity. This function is used to show that , which makes the minimally non-perfect graph for which is known. Further observations on the function are derived by Sason in [28]. The zero-error capacity of is still unknown. Several existing lower bounds on were found via computer programming, in particular in [29], [30] and [31].
To understand why no single-letter exists for the zero-error channel capacity, we study, similarly to Sec. II-B, the case where the channel transition probability decomposes as a product , as depicted in Fig. 9. This is called the independent channel coding problems.
Figure 9: Independent channel coding problems: the information is transmitted via parallel channels .
The zero-error capacity of independent channels is given by
(27)
where for all , is the characteristic graph associated to the conditional distribution .
In the vanishing error regime, the capacity of independent channels linearizes since
(28)
Therefore, it is optimal to concatenate the optimal codebooks designed for each channel .
In the zero-error regime, the capacity is super-linear as shown by Shannon in [3, Theorem 4],
(29)
Haemers shows in [20] that the inequality (29) is strict for the product of the Schläfli graph and its complementary graph , as stated in Theorem 7. An explicit construction of the Schläfli graph is provided in [32, Sec. 6.1]. As we will see in Sec. IV-C, Haemers’s result relies on a bound on the zero-error capacity based on the rank of the adjacency matrix of the graph. Refinements of this bound are developed by Bukh and Cox in [33], and by Gao et al. in [34].
Let be the Schläfli graph and its complementary graph, then
(30)
Since the linearization property does not hold in general, Wigderson and Zuiddam in [22] and Schrijver in [23], recently established a condition under which capacity linearization holds.
Theorem 8(from [22, Theorem 4.1] and [23, Theorem 2]).
For all graphs ,
(31)
(32)
We establish the connection between the linearization properties of Theorem 4 and Theorem 8, by using the zero-error capacity of a graph relative to a distribution , introduced by Csiszár and Körner in [24].
III-CZero-error capacity of a graph relative to a distribution and equivalence of the linearizations of and
Theorems 4 and 8 both establish the equivalence of linearizations for the AND product and disjoint union of graphs but for different quantities: and . We will show the equivalence between these two quantities and then deduce the equivalence between the two linearizations. To do this, we study the zero-error capacity of a graph relative to a distribution .
Definition 10.
A sequence of codes is said to be typical with respect to ,
or in short -typical, if
(33)
The zero-error channel capacity relative to the input distribution is the maximal rate among all -typical sequence of codes that satisfy the zero-error property:
(34)
Csiszár and Körner [24, Eq. (3.2)] derive an asymptotic expression of this capacity, which we review below.
Given the graph with the probability distribution ,
(36)
We can interpret the formula in Theorem 9 in the following way. The quantities and are respectively the minimum number of colors, and the maximum size of an independent set. A color class, i.e. vertices of the same color, is an independent subset of vertices: in the case with same-sized color classes we would need bits to describe the source sequence in its color. Therefore, can be seen as the average number of bits needed to describe the index of the source sequence in its color class. These two quantities sum up to , which is the information needed to describe the source sequence with zero-error. Equation (36) can be seen as an analog for zero-error regime of the formula .
We establish below the connection between the linearization properties of and of , where the equivalences in (38) and (40) follow from Marton’s formula in Theorem 9, and the equivalence (39) from Theorem 4. The complete proof is in App. B.
Proposition 5.
Let be a finite set, a distribution with full-support, and let a family of probabilistic graphs. The following equivalence holds:
(37)
(38)
(39)
(40)
III-DCapacity achieving distributions and equivalence of the linearizations of and for the AND product
From the previous equivalences, we present our second main contribution which shows the equivalence between the linearization properties of and . A key element is the set of input distributions that achieve the zero-error capacity in the sense that . As in the vanishing error regime, it seems optimal to consider codebooks composed of codewords that are typical with respect to the input distribution that maximizes .
The result of [25, Theorem 2], see also [7, Theorem 11.22], is stated for the Sperner capacity of a family of directed graphs, while the statement of [35, Theorem 13.68] is specific to the zero-error capacity of a family of graphs. For the sake of completeness, in App. C we provide a proof for Theorem 10 that does not rely on directed graphs. As a consequence of [26, Lemma 1] and of [25, Theorem 2], the zero-error capacity reformulates
(42)
In order to show the equivalence of the linearization properties between and , we define the set of capacity-achieving distributions.
Definition 11.
Let be a graph. The set of capacity-achieving distributions of is the subset of defined by
(43)
Proposition 6.
For all graphs , the map** is concave and the set of capacity-achieving distributions is convex, nonempty.
The proof of Proposition 6 is stated in App. D, and relies on Theorem 10.
The following Theorem is essential for demonstrating the equivalence of the linearizations depicted in Fig. 7. It establishes that if a joint distribution achieves capacity, then the product of its marginals also achieves it.
Theorem 11.
If , then .
The proof of Theorem 11 is stated in App. E and relies on a codebook shifting argument: given a codebook composed of codewords that are typical with respect to the joint distribution , we construct a set of permuted codebooks by applying a cyclic permutation only to the first component of each codeword. We concatenate all the permuted codebooks and we replicate them times so that the codewords length is equal to . Then, we remove the codewords that are not typical with respect to the product of marginal distributions . We show that this construction has the same rate and preserves the zero-error property. However, it modifies the types of the codewords, which become the product of marginals as wished.
We can now establish the equivalence of the linearizations of and for the AND product.
Theorem 12.
Let be a finite set, and be a family of graphs. The following equivalence holds:
(44)
(45)
Furthermore, any distribution that satisfies (45) also satisfies for all .
III-ELinearization of the sum of independent channels
Similarly to the side-information problem, see Sec. II-D, the disjoint union of graphs introduced in Proposition 5 for zero-error channel coding has an operational interpretation as a sum of channels, as depicted in Fig. 10. At each time step , the encoder uses one channel among the channels. The decoder observes an output, deduces the chosen channel, and retrieves its input. Since the output alphabets of each individual channel are disjoint, the channel output symbol uniquely identifies the channel that is used.
Figure 10: Sum of the channels : only the channel is used at instant .
In the vanishing error regime, the linearization of the capacity for the sum of channels holds since
The zero-error capacity of the sum of channels is given by
(47)
In the zero-error regime, Shannon in [3, Theorem 4] shows that
(48)
For the sum of channels, a natural coding scheme consists in using the optimal codebooks for each channel in a time sharing manner, with respect to the distribution that maximizes . In other words, with this strategy, communicating over the sum channel is equivalent to sending 2 types of information: one related to identifying the chosen channel, , and the other to the information on this channel, .
Lemma 5.
The map** has a unique maximum
(49)
which gives
(50)
The proof of Lemma 5 is given in App. F-B and relies on the fact that the function is the Legendre-Fenchel conjugate [36] of the entropy function .
We consider the time-sharing strategy between the optimal codebooks along with the distribution defined in (49). If this strategy is optimal, then
(51)
which means that the linearization property holds for the disjoint union of graphs.
Remark 2.
Note that is full-support: it can be observed has an infinite slope at the frontier of , consequently the maximizer of is always an interior point. In other words, the information carried by the channel index offsets the loss in rate, if the channels with smaller capacities are not chosen too often. Therefore, in the sum of channels setting, always choosing the channel with highest capacity is suboptimal, and never choosing a channel is also suboptimal, even if this channel has zero-error capacity equal to .
Similar to Theorem 12, we establish the equivalence between the linearization property between and for the disjoint union of a family of graphs .
Theorem 13.
The following equivalence holds
(52)
(53)
where and for all . Furthermore, any that satisfies (53) also satisfies the following for all :
(54)
The proof of Theorem 13 is given in App. H. This result relies on Lemma 5 which proves that the distribution maximizes .
Remark 3.
One could think of a possible strategy for proving Theorem 13, which is successively using the equivalences in Theorem 8, Theorem 12, and Proposition 5.
However, doing so yields the following statement
(55)
(56)
where it remains to link the sets of capacity achieving distributions and .
Theorem 13 together with Theorem 12, Theorem 8 and Proposition 5, establish the equivalence of the linearization property between , and for the AND product , and for the disjoint union of a family of graphs , as depicted in Fig. 7.
IV Main Example and Counterexamples for the Linearization of Optimal Rates
In this section, we exploit the equivalences in the linearization
depicted in Fig. 7, in order to provide single-letter characterization of and for several new classes of graphs.
IV-APerfect graphs
We show that perfect graphs allow for linearization of , and with respect to both and with any underlying probability distribution. Perfect graphs are one of the only known examples of graphs with a single-letter formula for and . Theorem. 4 allows us to provide new single-letter characterization for , and for products of perfect graphs, which are not perfect in general.
Definition 12(Graph complement, clique number ).
For all , the complementary graph of is defined by . The clique number of is defined by , where the independence number is stated in Definition 9.
Definition 13(Perfect graph).
A graph is perfect if for all subset of vertices . A probabilistic graph is perfect if is perfect.
A remarkable property of perfect graphs is their single-letter characterizations for zero-error problems. For example, as stated in Theorem 14, for the side-information problem, the optimal rate equals the Körner graph entropy, defined below and introduced in [19], when is a perfect graph.
Definition 14(Körner graph entropy ).
For all , let be the collection of independent sets of vertices in . The Körner graph entropy of is defined by
(57)
where the minimum is taken over all distributions with the constraint that the random vertex belongs to the random independent set with probability one, i.e. in (57).
Similarly, single-letter characterization holds for the zero-error capacity of perfect graphs, as stated below. This is a consequence of a more general result due to Shannon (see [3, Theorem 3]) that states that a graph whose vertex set can be partitioned into cliques, i.e. complete induced subgraphs, satisfies . Perfect graphs satisfy this property as their complementary is also perfect, and satisfy , where is the clique number, see [9, pp. 382].
We now derive single-letter characterizations of , , and for graphs where these quantities were previously unknown. These characterizations result from the linearization theorems: [22], [23] for , and Theorem 4 for and .
More precisely, consider some perfect graphs. Their disjoint union is perfect, as shown in Lemma 19, and linearizes: since holds for all perfect graphs . According to the linearization theorems [22], [23], since of the disjoint union linearizes, so does of the AND product: . This leads to the following proposition.
Proposition 8.
Let and be perfect graphs, then
(59)
(60)
According to the previous proposition, can now be computed for any pair of perfect graphs, as it linearizes. This result was previously unknown, as the AND product of perfect graphs is not necessarily perfect. For instance, cycle graphs and are perfect (due to the strong perfect graph theorem, mentioned below), but their AND product is not (also due to the strong perfect graph theorem); it contains an odd cycle of length 7, illustrated in Fig. 11.
Figure 11: A non-perfect AND product of perfect graphs: with an induced .
Theorem 16(Strong perfect graph theorem, from [38, Theorem 1.2]).
A graph is perfect if and only if neither nor have an induced odd cycle of length at least 5.
Similarly, we show that the linearization property of and holds for perfect graphs and for all underlying probability distributions, and we provide new single-letter expression for , in that case.
Theorem 17.
When is a family of perfect probabilistic graphs, we have the following single-letter characterizations:
(61)
(62)
(63)
(64)
The proof of Theorem 17 is given in App. I. An an example, we consider below the AND product of the cycle graphs and .
Corollary 2.
Consider the cycle graphs and and denote by and the probability distributions on the vertices. We have
(65)
(66)
We now explore the combination of a perfect graph with a non-perfect graph. More specifically, we consider the graph where is perfect, for which the linearization of was studied by Tuncel et al. in [21].
The pentagon graph is not perfect, thereby making any disjoint union or AND product involving it non-perfect. However, Theorem 4 provides a non-perfect example where the linearization property holds, offering a single-letter characterization of for the class of graphs , where is perfect.
Let , let be a perfect probabilistic graph, and let , we have
(67)
(68)
Corollary 3.
For all perfect probabilistic graph ,
(69)
IV-BVertex transitive graphs
We study the importance class of vertex-transitive graphs, where all the vertices of the graph play the same “role”, and show that the uniform distribution is capacity achieving for these graphs.
Definition 15(Vertex-transitive graph).
An automorphism of a graph is a bijection such that for all , if and only if . The group of automorphisms of G is denoted by .
A graph is vertex-transitive if acts transitively on its vertices, i.e. for all , there exists such that .
Let be vertex-transitive graphs, their product is also vertex-transitive and
(71)
IV-CThe Schläfli graph
We now study the important case of the Schläfli graph as it offers a counterexample for the linearizations of all quantities studied in this paper. In [20], Haemers showed that the linearization property does not hold for the product of the Schläfli graph with its complement ,
(72)
More specifically, Haemers shows that ,
and . In this section, we show that a similar conclusion holds for and for .
According to [39, Lemma 3.7], the Schläfli graph and its complement are vertex transitive, as well as their product .
By Proposition 9, the uniform distribution is capacity-achieving for , , and .
Corollary 5.
Consider the Schläfli graph and its complement . Then,
(73)
(74)
In Theorem 19, we extend Haemers’s results of [20] and we show that the linearization property does not hold for and when the distribution on the vertices is uniform. By using Lemma 1, we also equalize , and similarly , for the AND product and for the disjoint union , up to a certain constant.
Theorem 19.
Let , let be the Schläfli graph and its complementary, with uniform distributions on their vertices. Then,
(75)
(76)
(77)
(78)
where is the binary entropy. Moreover, we have
(79)
(80)
We obtain (75) from Theorem 12 and Corollary 5; (76) and (77) come from Proposition 5; and (78) comes from Theorem 4.
Remark 4.
Alon has built in [40] infinite families of graphs that satisfy . Similar results as in Theorem 19 can be derived for these graphs, by using their respective capacity-achieving distributions.
V Conclusion
We have shown the equivalences of linearization properties between , , and , as depicted in Fig. 7. We proved the equivalence between the suboptimality of separated zero-error coding on independent channels, and the suboptimality of separated compression of independent sources in the zero-error side-information setting, with same characteristic graph and capacity-achieving distribution.
We also state the following open questions:
-
As pointed out in Lemma 11, for all capacity-achieving distribution of a product graph, the product of its marginals is also capacity-achieving. Are these marginals capacity-achieving for the respective graphs in the product, and conversely, if we consider the product of capacity-achieving distributions of graphs, is this distribution capacity-achieving for the product of graphs? In other words,
(81)
We gave a partial answer in Theorem 12, in the sense that inclusion holds when the linearization of the product holds.
-
We have shown in Theorem 12 and Theorem 13 that the linearization property of holds if and only if the linearization property of holds, where is any capacity-achieving distribution. Can we find graphs such that the linearization property of holds when is capacity-achieving, but does not hold for some that is not capacity-achieving? A negative answer would imply that the linearization property of is equivalent to the linearization property of and for all , similarly to perfect graphs.
-
Finally, we have seen in Corollary 3 that with perfect is an example where the linearization property holds. Is the non-linearization property of tied to specific non-perfect induced subgraphs in each graph in the product? And if so, can we find a minimal family of these graphs?
In order to prove Theorem 4, we will need Lemma 1, Lemma 2 and Lemma 6. The proof of Lemma 1 is a direct consequence of Lemma 3, which is proved in App. A-A. Lemma 2 gives regularity properties of and is proved in App. A-B. Lemma 6 states that if a convex function of meets the linear interpolation of , where are the extreme points of , then is linear. The proof of Lemma 6 is given in App. A-C.
Lemma 6.
Let be a finite set, and be a convex function, and for all , let be the distribution that assigns 1 to the symbol and 0 to the others. Then the following holds:
(82)
(83)
where is the interior of (i.e. the full-support distributions on ).
We can use Lemma 1, which states that , hence . Thus, the function satisfies (82) with the interior point , and is convex by Lemma 2: by Lemma 6 we have
(84)
Conversely, assume (84), then is linear. We can use Lemma 1, and we have .
Lemma 3 is the consequence of a more general result, which consider probability distributions instead of types , for some .
Lemma 7.
Let a probability distribution with full-support and let be any sequence such that its type when . Then we have
(85)
The proof of Lemma 7 is stated in App. A-A1. Now let us prove Lemma 3. Let be a -periodic sequence such that , then for all , and . We can use Lemma 7 and consider every -th term in the limit:
We need several lemmas for this result.
Lemma 8 establishes the distributivity of with respect to for probabilistic graphs, similarly as in [41] for graphs without underlying distribution. Lemma 9 states that can be computed with subgraphs induced by sets that have an asymptotic probability one, in particular we will use it with typical sets of vertices.
Lemma 10 gives the chromatic entropy of a disjoint union of isomorphic probabilistic graphs.
The proofs of Lemma 8, Lemma 9 and Lemma 10 are respectively given in App. A-A2, App. A-A3, and Appendix A-A4.
Lemma 8.
Let be finite sets, let and . For all and , let and be probabilistic graphs. Then
(89)
Lemma 9.
Let , and be a sequence of sets such that for all , , and when . Then .
Definition 16(Isomorphic probabilistic graphs).
Let and be two probabilistic graphs. We say that is isomorphic to (denoted by ) if there exists an isomorphism between them, i.e. a bijection such that:
•
For all , ,
•
For all , .
Lemma 10.
Let be a finite set, let and let be a family of isomorphic probabilistic graphs, then for all .
Now let us prove Lemma 7. Let , and let . Let be a sequence such that when .
Let us study the limit in (91). For all large enough, as . Therefore, for all , , and large enough, we have
(92)
We have on one hand
(93)
(94)
(95)
(96)
(97)
(98)
(99)
where (93) comes from Lemma 8; (94) comes from the definition of and in (90); (95) is a rearrangement of the terms inside the product; (96) comes from (92); (97) follows from Lemma 10, the graphs are isomorphic as they do not depend on ; (98) follows from the subadditivity of ; and (99) is the upper bound on given by the highest entropy of a coloring.
On the other hand, we obtain with similar arguments
(100)
(101)
(102)
Note that (101) also comes from the subadditivity of , as for all .
In order to prove Lemma 9, we need Lemma 11. In Lemma 11 we give upper and lower bounds on the chromatic entropy of an induced subgraph , using the chromatic entropy of the whole graph and the probability . The core idea is that if is close to and is big, then is close to . The proof of Lemma 11 is given in App. A-A5
Lemma 11.
Let and , then
(112)
Remark 5.
can be greater than , even if has less vertices and inherits the structure of . This stems from the normalized distribution on the vertices of which gives more weight to the vertices in . For example, consider
where (resp. ) is the complete (resp. empty) graph with vertices, i.e. there is an edge (resp. no edge) between any pair of distinct vertices, and with being the vertices in the connected component in . Then and .
Now let us prove Lemma 9. By Lemma 11, we have for all :
(113)
Since , and when , the desired results follows immediately by normalization and limit.
Let be isomorphic probabilistic graphs and such that . Let be the coloring of with minimal entropy, and let be the coloring of defined by
(114)
(115)
where is the unique integer such that , and is an isomorphism between and . In other words applies the same coloring pattern on each connected component of . We have
(116)
(117)
(118)
(119)
(120)
where denotes the entropy of a distribution; (118) comes from the definition of ; and (120) comes from the definition of .
Now let us prove the upper bound on . Let be a coloring of , and let (i.e. is the index of the connected component for which the entropy of the coloring induced by is minimal). We have
(121)
(122)
(123)
(124)
(125)
where (122) follows from the concavity of ; (123) follows from the definition of ; (124) comes from the fact that induces a coloring of ; (125) comes from the fact that and are isomorphic. Now, we can combine the bounds (120) and (125): for all coloring of we have
(126)
which yields the desired equality when taking the infimum over .
Let and be the optimal colorings of and , respectively. Consider the coloring of defined by if , otherwise.
(Lower bound) On one hand, we have
(127)
(128)
(129)
(130)
where (127) comes from the fact that is a coloring of ; (128) is a decomposition using conditional entropies; (129) comes from the construction of : ; (130) follows from the optimality of as a coloring of .
(Upper bound) On the other hand,
(131)
(132)
(133)
where (131) comes from the fact that induces a coloring of ; (132) is a decomposition using conditional entropies; (133) results from the elimination of negative terms and the optimality of .
In order to prove Lemma 2 we need Lemma 7, which can be found in App. A-A1; and Lemma 12, which is a generalization for infinite sequences of the following observation: if satisfies with and , then can be separated into two subsequences and such that and .
Lemma 12(Type-splitting lemma).
Let be a sequence such that when , let and such that
(134)
Then there exists a sequence such that the two extracted sequences and satisfy
(135)
(136)
The proof of Lemma 12 is given in App. A-B1. Now let us prove Lemma 2. We recall the definition of the function
(137)
( Lipschitz) Let us first prove that the function is Lipschitz. For all we need to bound the quantity ; by Lemma 7 this is equivalent to bounding
(138)
where when .
Fix , we assume that the quantity inside in (138) is positive; the other case can be treated with the same arguments by symmetry of the roles. We have
(139)
(140)
(141)
(142)
(143)
(144)
where and ; (140) follows from the removal of terms in the second product, as for all probabilistic graphs ; (141) is an arrangement of the terms in the first product, as for all real numbers ; (142) comes from the subadditivity of ; (143) follows from for all ; (144) results from for all .
By normalization and limit, it follows that
(145)
(146)
Hence is -Lipschitz.
( convex) Let us now prove that is convex. Let , and , we have by Lemma 7
(147)
where when . By Lemma 12, there exists such that the decomposition of into two subsequences and satisfies
(148)
(149)
For all , let , we have
(150)
(151)
(152)
(153)
(154)
where (151) comes from (147); (153) follows from the subadditivity of ; (154) comes from (148), (149) and Lemma 7. Since (154) holds for all and , we have that is convex.
In order to prove Proposition 5 we need Lemma 13, which is a consequence of Marton’s formula in Theorem 9 applied to a disjoint union. The proof of Lemma 13 can be found in App. B-A.
The probabilistic graph has as underlying distribution. Let be two random variables such that is drawn with , and is drawn with , so that
(177)
We have
(178)
(179)
(180)
(181)
(182)
where (179) comes from Theorem 9 and (177); and (180) comes from the fact that can be written as a function of : by definition, the vertex set of writes and , therefore is the unique index such that .
Let be a sequence such that for all , is an independent set in , and
(186)
the existence of the sequence follows from the definition of .
Let be the sequence defined by: for all ,
(187)
The terms of the sequence are in , which is a compact set. Therefore, by Bolzano-Weierstrass theorem, has a convergent subsequence , where is strictly increasing. We denote by the corresponding subsequence of independent sets, and
(188)
By construction, we also have
(189)
Let us build an adequate sequence of codebooks with type converging uniformly to , and with asymptotic rate . For all , let
(190)
where .
It can be easily observed that and : by construction we have
(191)
Furthermore, for all , is independent in , as it is contained in the independent set .
Now let us prove that when . Let us draw a codeword
(192)
uniformly from , and show that it is in with high probability. On one hand, for all , the average type of writes
(193)
On the other hand,
(194)
(195)
(196)
(197)
(198)
(199)
(200)
(201)
(202)
(203)
where (195) and (199) come from the construction of ; (198) comes from the construction of ; (201) follows from the union bound; (202) comes from Chebyshev’s inequality and (193); (203) follows from , as the random variables are iid and takes values in . Hence
(204)
By combining (191), (204) and Lemma 4, it follows that
The function is concave on the convex compact set , therefore its set of maximizers is convex. Furthermore, by Theorem 10, the set is nonempty and satisfies
The proof techniques used here are similar as in the proof of Theorem 10 in App. C.
Let us start by showing that Theorem 11 is true when has two elements. Let , and be two graphs, and let . We will prove that is also capacity-achieving by building an adequate sequence of codebooks.
For all , let such that is an independent set in , and
(222)
(223)
The existence of such a sequence is given by Lemma 4, and Proposition 6. Let
Let us build a sequence of codebooks with asymptotic rate , such that the type of their codewords converge uniformly to :
(226)
where
(227)
and where for all , the shifted codebook is defined by
(228)
By construction, thanks to (226), and thanks to (227) and (225); therefore we have
(229)
Furthermore, is an independent set in , as it is contained in the product independent set ; note that this holds because the shifted codebook is an independent set in for all .
Now let us prove that . Let us draw a codeword uniformly from :
(230)
where for all , is a random -sequence drawn uniformly from . We want to prove that with high probability.
On one hand we have to determine the average type of the random variables which are iid copies of ; where each is drawn uniformly from , and the are mutually independent.
(231)
(232)
(233)
(234)
(235)
(236)
where ; (233) comes from the construction of in (228); and (235) comes from the following observation:
(237)
(238)
where the index is taken modulo .
On the other hand we have
(239)
(240)
(241)
(242)
(243)
(244)
(245)
(246)
(247)
where (241) and (242) come from the construction of ; (244) comes from the construction of ; (245) follows from the union bound; (246) comes from Chebyshex’s inequality and (236); and (247) comes from the fact that , as the random variables are iid and takes values in . Hence
(248)
where the second equality holds as the shifted codebooks all have cardinality .
Thus, by combining (248), Lemma 4, and Proposition 6 we obtain
(249)
hence .
Therefore, Theorem 11 is proved when has two elements:
(250)
Now let us consider the case where has a cardinality greater than 2. Let . By considering the product graphs
(251)
for all , and applying (250) successively, we obtain
(252)
(253)
(254)
Appendix F Results on capacity-achieving distributions
The techniques used in this proof are the same as in the proof of Theorem 12.
We prove Theorem 13 in two steps, which are Lemma 16 and Lemma 17; their proofs are respectively given in App. H-A and H-B.
Lemma 16.
Let
(303)
we have
(304)
Lemma 17.
Let
(305)
for all the following holds
(306)
Now let us prove Theorem 13. Let ,
we have by Lemma 16
where (315) and (316) come from (312) and Proposition 6; (317) comes from Proposition 5; (318) comes from (313) and Proposition 6; and (319) comes from (314) and Lemma 5.
Assume that , then equality holds in (315) to (319), therefore the following holds:
where (324) comes from (321) and Proposition 6; (325) comes from (29), see [3, Theorem 4];
(326) and (327) come from (323) and Lemma 5, which can be found in App. H-A; (328) and (329) come from (322) and Proposition 6.
Assume that , then equality holds in (324) to (329). In particular as a consequence of the equality between (326) and (327); and as a consequence of the equality between (328) and (329). Thus, for all the following holds:
Lemma 18 comes from [35, Corollary 1], and states that the function , defined analogously to , is always linear. The proof of Lemma 19 is given in App. I-A.
For all , let be a perfect probabilistic graph. By Lemma 19, is also perfect; and we have by Theorem 14. We also have by Lemma 18 and Theorem 14 used on the perfect graphs . Thus
(331)
By Theorem 4, it follows that , where the last equality comes from Theorem 14.
Let be a perfect probabilistic graph. Let and . We have since is perfect, and therefore , as . Thus all the graphs are perfect.
Conversely, assume that for all , is perfect. Then for all , can be written as where for all , and we have for all :
(332)
(333)
(334)
and similarly, .
Hence is also perfect.
References
[1]
C. E. Shannon, “A mathematical theory of communication,” The Bell
system technical journal, vol. 27, no. 3, pp. 379–423, 1948.
[2]
D. A. Huffman, “A method for the construction of minimum-redundancy codes,”
Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101, 1952.
[3]
C. Shannon, “The zero error capacity of a noisy channel,” IRE
Transactions on Information Theory, vol. 2, no. 3, pp. 8–19, 1956.
[4]
A. Vesel and J. Žerovnik, “Improved lower bound on the Shannon capacity
of ,” Information Processing Letters, vol. 81, no. 5, pp.
277–282, 2002.
[5]
B. Codenotti, I. Gerace, and G. Resta, “Some remarks on the Shannon capacity
of odd cycles,” Ars Combinatoria, vol. 66, pp. 243–258, 2003.
[6]
S. C. Polak and A. Schrijver, “New lower bound on the Shannon capacity of
from circular graphs,” Information Processing Letters, vol.
143, pp. 37–40, 2019.
[7]
I. Csiszár and J. Körner, Information theory: coding theorems for
discrete memoryless systems. Cambridge University Press, 2011.
[8]
S. Klavzar, R. Hammack, and W. Imrich, “Handbook of graph products,” 2011.
[9]
C. Berge, Graphs and Hypergraphs, ser. North-Holland mathematical
library. Amsterdam, 1973.
[10]
M. Grötschel, L. Lovász, and A. Schrijver, “Polynomial algorithms for
perfect graphs,” Ann. Discrete Math, vol. 21, pp. 325–356, 1984.
[11]
C. Berge, “Farbung von graphen, deren samtliche bzw. deren ungerade kreise
starr sind,” Wissenschaftliche Zeitschrift, 1961.
[12]
M. Chudnovsky, N. Robertson, P. D. Seymour, and R. Thomas, “Progress on
perfect graphs,” Mathematical Programming, vol. 97, no. 1-2, pp.
405–422, 2003.
[13]
L. Lovász, “On the Shannon capacity of a graph,” IEEE Transactions
on Information Theory, vol. 25, no. 1, pp. 1–7, 1979.
[14]
H. Witsenhausen, “The zero-error side information problem and chromatic
numbers (corresp.),” IEEE Transactions on Information Theory,
vol. 22, no. 5, pp. 592–593, 1976.
[15]
D. Slepian and J. Wolf, “Noiseless coding of correlated information sources,”
IEEE Transactions on information Theory, vol. 19, no. 4, pp. 471–480,
1973.
[16]
N. Alon and A. Orlitsky, “Source coding and graph entropies,” IEEE
Transactions on Information Theory, vol. 42, no. 5, pp. 1329–1339, 1996.
[17]
P. Koulgi, E. Tuncel, S. L. Regunathan, and K. Rose, “On zero-error source
coding with decoder side information,” IEEE Transactions on
Information Theory, vol. 49, no. 1, pp. 99–111, 2003.
[18]
J. Körner and G. Longo, “Two-step encoding for finite sources,” IEEE
Transactions on Information Theory, vol. 19, no. 6, pp. 778–782, 1973.
[19]
J. Körner, “Coding of an information source having ambiguous alphabet and the
entropy of graphs,” Transactions of the 6th Prague Conference on
Information Theory, pp. 411—425, 1973.
[20]
W. Haemers, “On some problems of Lovász concerning the Shannon capacity
of a graph,” IEEE Transactions on Information Theory, vol. 25,
no. 2, pp. 231–232, 1979.
[21]
E. Tuncel, J. Nayak, P. Koulgi, and K. Rose, “On complementary graph
entropy,” IEEE transactions on information theory, vol. 55, no. 6,
pp. 2537–2546, 2009.
[22]
A. Wigderson and J. Zuiddam, Asymptotic spectra: Theory, applications and
extensions. manuscript, 2023.
[23]
A. Schrijver, “On the Shannon capacity of sums and products of graphs,”
Indagationes Mathematicae, vol. 34, no. 1, pp. 37–41, 2023.
[24]
I. Csiszár and J. Körner, “On the capacity of the arbitrarily varying
channel for maximum probability of error,” Zeitschrift für
Wahrscheinlichkeitstheorie und verwandte Gebiete, vol. 57, no. 1, pp.
87–101, 1981.
[25]
L. Gargano, J. Körner, and U. Vaccaro, “Capacities: from information
theory to extremal set theory,” Journal of Combinatorial Theory,
Series A, vol. 68, no. 2, pp. 296–316, 1994.
[26]
K. Marton, “On the Shannon capacity of probabilistic graphs,” Journal
of Combinatorial Theory, Series B, vol. 57, no. 2, pp. 183–195, 1993.
[27]
L. Lovász, “On the Shannon capacity of a graph,” IEEE
Transactions on Information theory, vol. 25, no. 1, pp. 1–7, 1979.
[28]
I. Sason, “Observations on the Lovász -function, graph capacity,
eigenvalues, and strong products,” Entropy, vol. 25, no. 1, p. 104,
2023.
[29]
A. Vesel and J. Žerovnik, “Improved lower bound on the Shannon capacity
of ,” Information processing letters, vol. 81, no. 5, pp.
277–282, 2002.
[30]
K. A. Mathew and P. R. Östergård, “New lower bounds for the Shannon
capacity of odd cycles,” Designs, Codes and Cryptography, vol. 84,
pp. 13–22, 2017.
[31]
S. C. Polak and A. Schrijver, “New lower bound on the Shannon capacity of c7
from circular graphs,” Information Processing Letters, vol. 143, pp.
37–40, 2019.
[32]
M. Chudnovsky and P. D. Seymour, “The structure of claw-free graphs.” in
BCC, 2005, pp. 153–171.
[33]
B. Bukh and C. Cox, “On a fractional version of Haemers’ bound,” IEEE
Transactions on Information Theory, vol. 65, no. 6, pp. 3340–3348, 2018.
[34]
L. Gao, S. Gribling, and Y. Li, “On a tracial version of Haemers bound,”
IEEE Transactions on Information Theory, 2022.
[35]
G. Simonyi, “Perfect graphs and graph entropy. An updated survey,” in
Perfect Graphs, B. A. Reed and J. L. R. Alfonsin, Eds. John Wiley & Sons, 2001, ch. 13, pp. 293–328.
[36]
H. Touchette, “Legendre-Fenchel transforms in a nutshell,” URL
http://www. maths. qmul. ac. uk/~ ht/archive/lfth2. pdf, 2005.
[37]
I. Csiszár, J. Körner, L. Lovász, K. Marton, and G. Simonyi,
“Entropy splitting for antiblocking corners and perfect graphs,”
Combinatorica, vol. 10, no. 1, pp. 27–40, 1990.
[38]
M. Chudnovsky, N. Robertson, P. Seymour, and R. Thomas, “The strong perfect
graph theorem,” Annals of mathematics, pp. 51–229, 2006.
[39]
P. J. Cameron, “6-transitive graphs,” Journal of Combinatorial Theory,
Series B, vol. 28, no. 2, pp. 168–179, 1980.
[40]
N. Alon, “The Shannon capacity of a union,” Combinatorica, vol. 18,
no. 3, pp. 301–310, 1998.
[41]
J. Zuiddam et al., Algebraic complexity, asymptotic spectra and
entanglement polytopes. Institute for
Logic, Language and Computation, 2018.
[42]
G. Simonyi, “Graph entropy: a survey,” Combinatorial Optimization,
vol. 20, pp. 399–441, 1995.