-
Universal Reverse Information Projections and Optimal E-statistics
Authors:
Tyron Lardy,
Peter Grünwald,
Peter Harremoës
Abstract:
Information projections have found important applications in probability theory, statistics, and related areas. In the field of hypothesis testing in particular, the reverse information projection (RIPr) has recently been shown to lead to so-called growth-rate optimal (GRO) e-statistics for testing simple alternatives against composite null hypotheses. However, the RIPr as well as the GRO criterio…
▽ More
Information projections have found important applications in probability theory, statistics, and related areas. In the field of hypothesis testing in particular, the reverse information projection (RIPr) has recently been shown to lead to so-called growth-rate optimal (GRO) e-statistics for testing simple alternatives against composite null hypotheses. However, the RIPr as well as the GRO criterion are undefined whenever the infimum information divergence between the null and alternative is infinite. We show that in such scenarios there often still exists an element in the alternative that is 'closest' to the null: the universal reverse information projection. The universal reverse information projection and its non-universal counterpart coincide whenever information divergence is finite. Furthermore, the universal RIPr is shown to lead to optimal e-statistics in a sense that is a novel, but natural, extension of the GRO criterion. We also give conditions under which the universal RIPr is a strict sub-probability distribution, as well as conditions under which an approximation of the universal RIPr leads to approximate e-statistics. For this case we provide tight relations between the corresponding approximation rates.
△ Less
Submitted 4 December, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Unnormalized Measures in Information Theory
Authors:
Peter Harremoës
Abstract:
Information theory is built on probability measures and by definition a probability measure has total mass 1. Probability measures are used to model uncertainty, and one may ask how important it is that the total mass is one. We claim that the main reason to normalize measures is that probability measures are related to codes via Kraft's inequality. Using a minimum description length approach to s…
▽ More
Information theory is built on probability measures and by definition a probability measure has total mass 1. Probability measures are used to model uncertainty, and one may ask how important it is that the total mass is one. We claim that the main reason to normalize measures is that probability measures are related to codes via Kraft's inequality. Using a minimum description length approach to statistics we will demonstrate with that measures that are not normalized require a new interpretation that we will call the Poisson interpretation. With the Poisson interpretation many problems can be simplified. The focus will shift from from probabilities to mean values. We give examples of improvements of test procedures, improved inequalities, simplified algorithms, new projection results, and improvements in our description of quantum systems.
△ Less
Submitted 5 February, 2022;
originally announced February 2022.
-
Rate Distortion Theory for Descriptive Statistics
Authors:
Peter Harremoës
Abstract:
Rate distortion theory was developed for optimizing lossy compression of data, but it also has a lot of applications in statistics. In this paper we will see how rate distortion theory can be used to analyze a complicated data set involving orientations of early Islamic mosques. The analysis involves testing, identification of outliers, choice of compression rate, calculation of optimal reconstruc…
▽ More
Rate distortion theory was developed for optimizing lossy compression of data, but it also has a lot of applications in statistics. In this paper we will see how rate distortion theory can be used to analyze a complicated data set involving orientations of early Islamic mosques. The analysis involves testing, identification of outliers, choice of compression rate, calculation of optimal reconstruction points, and assigning "descriptive confidence regions" to the reconstruction points. In this paper the focus will be on the methods, so the integrity of the data set and the interpretation of the results will not be discussed.
△ Less
Submitted 16 February, 2022; v1 submitted 10 January, 2022;
originally announced January 2022.
-
Bounds on the Information Divergence for Hypergeometric Distributions
Authors:
Peter Harremoës,
František Matúš
Abstract:
The hypergeometric distributions have many important applications, but they have not had sufficient attention in information theory. Hypergeometric distributions can be approximated by binomial distributions or Poisson distributions. In this paper we present upper and lower bounds on information divergence. These bounds are important for statistical testing and a better understanding of the notion…
▽ More
The hypergeometric distributions have many important applications, but they have not had sufficient attention in information theory. Hypergeometric distributions can be approximated by binomial distributions or Poisson distributions. In this paper we present upper and lower bounds on information divergence. These bounds are important for statistical testing and a better understanding of the notion of exchange-ability.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.
-
From Thermodynamic Sufficiency to Information Causality
Authors:
Peter Harremoës
Abstract:
The principle called information causality has been used to deduce Tsirelson's bound. In this paper we derive information causality from monotonicity of divergence and relate it to more basic principles related to measurements on thermodynamic systems. This principle is more fundamental in the sense that it can be formulated for both unipartite systems and multipartite systems while information ca…
▽ More
The principle called information causality has been used to deduce Tsirelson's bound. In this paper we derive information causality from monotonicity of divergence and relate it to more basic principles related to measurements on thermodynamic systems. This principle is more fundamental in the sense that it can be formulated for both unipartite systems and multipartite systems while information causality is only defined for multipartite systems. Thermodynamic sufficiency is a strong condition that put severe restrictions to shape of the state space to an extend that we conjecture that under very weak regularity conditions it can be used to deduce the complex Hilbert space formalism of quantum theory. Since the notion of sufficiency is relevant for all convex optimization problems there are many examples where it does not apply.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.
-
Statistical Inference and Exact Saddle Point Approximations
Authors:
Peter Harremoës
Abstract:
Statistical inference may follow a frequentist approach or it may follow a Bayesian approach or it may use the minimum description length principle (MDL). Our goal is to identify situations in which these different approaches to statistical inference coincide. It is proved that for exponential families MDL and Bayesian inference coincide if and only if the renormalized saddle point approximation f…
▽ More
Statistical inference may follow a frequentist approach or it may follow a Bayesian approach or it may use the minimum description length principle (MDL). Our goal is to identify situations in which these different approaches to statistical inference coincide. It is proved that for exponential families MDL and Bayesian inference coincide if and only if the renormalized saddle point approximation for the conjugated exponential family is exact. For 1-dimensional exponential families the only families with exact renormalized saddle point approximations are the Gaussian location family, the Gamma family and the inverse Gaussian family. They are conjugated families of the Gaussian location family, the Gamma family and the Poisson-exponential family. The first two families are self-conjugated implying that only for the two first families the Bayesian approach is consistent with the frequentist approach. In higher dimensions there are more examples.
△ Less
Submitted 6 May, 2018;
originally announced May 2018.
-
Entropy on Spin Factors
Authors:
Peter Harremoës
Abstract:
Recently it has been demonstrated that the Shannon entropy or the von Neuman entropy are the only entropy functions that generate a local Bregman divergences as long as the state space has rank 3 or higher. In this paper we will study the properties of Bregman divergences for convex bodies of rank 2. The two most important convex bodies of rank 2 can be identified with the bit and the qubit. We de…
▽ More
Recently it has been demonstrated that the Shannon entropy or the von Neuman entropy are the only entropy functions that generate a local Bregman divergences as long as the state space has rank 3 or higher. In this paper we will study the properties of Bregman divergences for convex bodies of rank 2. The two most important convex bodies of rank 2 can be identified with the bit and the qubit. We demonstrate that if a convex body of rank 2 has a Bregman divergence that satisfies sufficiency then the convex body is spectral and if the Bregman divergence is monotone then the convex body has the shape of a ball. A ball can be represented as the state space of a spin factor, which is the most simple type of Jordan algebra. We also study the existence of recovery maps for Bregman divergences on spin factors. In general the convex bodies of rank 2 appear as faces of state spaces of higher rank. Therefore our results give strong restrictions on which convex bodies could be the state space of a physical system with a well-behaved entropy function.
△ Less
Submitted 4 May, 2018; v1 submitted 11 July, 2017;
originally announced July 2017.
-
Quantum Information on Spectral Sets
Authors:
Peter Harremoës
Abstract:
For convex optimization problems Bregman divergences appear as regret functions. Such regret functions can be defined on any convex set but if a sufficiency condition is added the regret function must be proportional to information divergence and the convex set must be spectral. Spectral set are sets where different orthogonal decompositions of a state into pure states have unique mixing coefficie…
▽ More
For convex optimization problems Bregman divergences appear as regret functions. Such regret functions can be defined on any convex set but if a sufficiency condition is added the regret function must be proportional to information divergence and the convex set must be spectral. Spectral set are sets where different orthogonal decompositions of a state into pure states have unique mixing coefficients. Only on such spectral sets it is possible to define well behaved information theoretic quantities like entropy and divergence. It is only possible to perform measurements in a reversible way if the state space is spectral. The most important spectral sets can be represented as positive elements of Jordan algebras with trace 1. This means that Jordan algebras provide a natural framework for studying quantum information. We compare information theory on Hilbert spaces with information theory in more general Jordan algebras, and conclude that much of the formalism is unchanged but also identify some important differences.
△ Less
Submitted 10 February, 2017; v1 submitted 23 January, 2017;
originally announced January 2017.
-
Divergence and Sufficiency for Convex Optimization
Authors:
Peter Harremoës
Abstract:
Logarithmic score and information divergence appear in information theory, statistics, statistical mechanics, and portfolio theory. We demonstrate that all these topics involve some kind of optimization that leads directly to regret functions and such regret functions are often given by a Bregman divergence. If the regret function also fulfills a sufficiency condition it must be proportional to in…
▽ More
Logarithmic score and information divergence appear in information theory, statistics, statistical mechanics, and portfolio theory. We demonstrate that all these topics involve some kind of optimization that leads directly to regret functions and such regret functions are often given by a Bregman divergence. If the regret function also fulfills a sufficiency condition it must be proportional to information divergence. We will demonstrate that sufficiency is equivalent to the apparently weaker notion of locality and it is also equivalent to the apparently stronger notion of monotonicity. These sufficiency conditions have quite different relevance in the different areas of application, and often they are not fulfilled. Therefore sufficiency conditions can be used to explain when results from one area can be transferred directly to another and when one will experience differences.
△ Less
Submitted 10 April, 2017; v1 submitted 4 January, 2017;
originally announced January 2017.
-
Maximum Entropy and Sufficiency
Authors:
Peter Harremoës
Abstract:
The notion of Bregman divergence and sufficiency will be defined on general convex state spaces. It is demonstrated that only spectral sets can have a Bregman divergence that satisfies a sufficiency condition. Positive elements with trace 1 in a Jordan algebra are examples of spectral sets, and the most important example is the set of density matrices with complex entries. It is conjectured that i…
▽ More
The notion of Bregman divergence and sufficiency will be defined on general convex state spaces. It is demonstrated that only spectral sets can have a Bregman divergence that satisfies a sufficiency condition. Positive elements with trace 1 in a Jordan algebra are examples of spectral sets, and the most important example is the set of density matrices with complex entries. It is conjectured that information theoretic considerations lead directly to the notion of Jordan algebra under some regularity conditions.
△ Less
Submitted 3 September, 2016; v1 submitted 8 July, 2016;
originally announced July 2016.
-
Sufficiency on the Stock Market
Authors:
Peter Harremoës
Abstract:
It is well-known that there are a number of relations between theoretical finance theory and information theory. Some of these relations are exact and some are approximate. In this paper we will explore some of these relations and determine under which conditions the relations are exact. It turns out that portfolio theory always leads to Bregman divergences. The Bregman divergence is only proporti…
▽ More
It is well-known that there are a number of relations between theoretical finance theory and information theory. Some of these relations are exact and some are approximate. In this paper we will explore some of these relations and determine under which conditions the relations are exact. It turns out that portfolio theory always leads to Bregman divergences. The Bregman divergence is only proportional to information divergence in situations that are essentially equal to the type of gambling studied by Kelly. This can be related an abstract sufficiency condition.
△ Less
Submitted 27 January, 2016;
originally announced January 2016.
-
Bounds on Tail Probabilities in Exponential families
Authors:
Peter Harremoës
Abstract:
In this paper we present various new inequalities for tail proabilities for distributions that are elements of the most improtant exponential families. These families include the Poisson distributions, the Gamma distributions, the binomial distributions, the negative binomial distributions and the inverse Gaussian distributions. All these exponential families have simple variance functions and the…
▽ More
In this paper we present various new inequalities for tail proabilities for distributions that are elements of the most improtant exponential families. These families include the Poisson distributions, the Gamma distributions, the binomial distributions, the negative binomial distributions and the inverse Gaussian distributions. All these exponential families have simple variance functions and the variance functions play an important role in the exposition. All the inequalities presented in this paper are formulated in terms of the signed log-likelihood. The inequalities are of a qualitative nature in that they can be formulated either in terms of stochastic domination or in terms of an intersection property that states that a certain discrete distribution is very close to a certain continuous distribution.
△ Less
Submitted 8 February, 2016; v1 submitted 20 January, 2016;
originally announced January 2016.
-
Thinning and Information Projections
Authors:
Peter Harremoës,
Oliver Johnson,
Ioannis Kontoyiannis
Abstract:
In this paper we establish lower bounds on information divergence of a distribution on the integers from a Poisson distribution. These lower bounds are tight and in the cases where a rate of convergence in the Law of Thin Numbers can be computed the rate is determined by the lower bounds proved in this paper. General techniques for getting lower bounds in terms of moments are developed. The result…
▽ More
In this paper we establish lower bounds on information divergence of a distribution on the integers from a Poisson distribution. These lower bounds are tight and in the cases where a rate of convergence in the Law of Thin Numbers can be computed the rate is determined by the lower bounds proved in this paper. General techniques for getting lower bounds in terms of moments are developed. The results about lower bound in the Law of Thin Numbers are used to derive similar results for the Central Limit Theorem.
△ Less
Submitted 17 January, 2016;
originally announced January 2016.
-
Proper Scoring and Sufficiency
Authors:
Peter Harremoës
Abstract:
Logarithmic score and information divergence appear in both information theory, statistics, statistical mechanics, and portfolio theory. We demonstrate that all these topics involve some kind of optimization that leads directly to the use of Bregman divergences. If a sufficiency condition is also fulfilled the Bregman divergence must be proportional to information divergence. The sufficiency condi…
▽ More
Logarithmic score and information divergence appear in both information theory, statistics, statistical mechanics, and portfolio theory. We demonstrate that all these topics involve some kind of optimization that leads directly to the use of Bregman divergences. If a sufficiency condition is also fulfilled the Bregman divergence must be proportional to information divergence. The sufficiency condition has quite different consequences in the different areas of application, and often it is not fulfilled. Therefore the sufficiency condition can be used to explain when results from one area can be transferred directly from one area to another and when one will experience differences.
△ Less
Submitted 25 July, 2015;
originally announced July 2015.
-
Lattices with non-Shannon Inequalities
Authors:
Peter Harremoës
Abstract:
We study the existence or absence of non-Shannon inequalities for variables that are related by functional dependencies. Although the power-set on four variables is the smallest Boolean lattice with non-Shannon inequalities there exist lattices with many more variables without non-Shannon inequalities. We search for conditions that ensures that no non-Shannon inequalities exist. It is demonstrated…
▽ More
We study the existence or absence of non-Shannon inequalities for variables that are related by functional dependencies. Although the power-set on four variables is the smallest Boolean lattice with non-Shannon inequalities there exist lattices with many more variables without non-Shannon inequalities. We search for conditions that ensures that no non-Shannon inequalities exist. It is demonstrated that 3-dimensional distributive lattices cannot have non-Shannon inequalities and planar modular lattices cannot have non-Shannon inequalities. The existence of non-Shannon inequalities is related to the question of whether a lattice is isomorphic to a lattice of subgroups of a group.
△ Less
Submitted 15 February, 2015;
originally announced February 2015.
-
Mutual information of Contingency Tables and Related Inequalities
Authors:
Peter Harremoës
Abstract:
For testing independence it is very popular to use either the $χ^{2}$-statistic or $G^{2}$-statistics (mutual information). Asymptotically both are $χ^{2}$-distributed so an obvious question is which of the two statistics that has a distribution that is closest to the $χ^{2}$-distribution. Surprisingly the distribution of mutual information is much better approximated by a $χ^{2}$-distribution tha…
▽ More
For testing independence it is very popular to use either the $χ^{2}$-statistic or $G^{2}$-statistics (mutual information). Asymptotically both are $χ^{2}$-distributed so an obvious question is which of the two statistics that has a distribution that is closest to the $χ^{2}$-distribution. Surprisingly the distribution of mutual information is much better approximated by a $χ^{2}$-distribution than the $χ^{2}$-statistic. For technical reasons we shall focus on the simplest case with one degree of freedom. We introduce the signed log-likelihood and demonstrate that its distribution function can be related to the distribution function of a standard Gaussian by inequalities. For the hypergeometric distribution we formulate a general conjecture about how close the signed log-likelihood is to a standard Gaussian, and this conjecture gives much more accurate estimates of the tail probabilities of this type of distribution than previously published results. The conjecture has been proved numerically in all cases relevant for testing independence and further evidence of its validity is given.
△ Less
Submitted 1 February, 2014;
originally announced February 2014.
-
Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families
Authors:
Peter Bartlett,
Peter Grunwald,
Peter Harremoes,
Fares Hedayati,
Wojciech Kotlowski
Abstract:
We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable, and if and only if the optimal strategy can be calculated without knowing the time horizon in advance. They…
▽ More
We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable, and if and only if the optimal strategy can be calculated without knowing the time horizon in advance. They put forward the question what families have exchangeable SNML strategies. This paper fully answers this open problem for one-dimensional exponential families. The exchangeability can happen only for three classes of natural exponential family distributions, namely the Gaussian, Gamma, and the Tweedie exponential family of order 3/2. Keywords: SNML Exchangeability, Exponential Family, Online Learning, Logarithmic Loss, Bayesian Strategy, Jeffreys Prior, Fisher Information1
△ Less
Submitted 19 May, 2013;
originally announced May 2013.
-
Extendable MDL
Authors:
Peter Harremoës
Abstract:
In this paper we show that combination of the minimum description length principle and a exchange-ability condition leads directly to the use of Jeffreys prior. This approach works in most cases even when Jeffreys prior cannot be normalized. Kraft's inequality links codes and distributions but a closer look at this inequality demonstrates that this link only makes sense when sequences are consider…
▽ More
In this paper we show that combination of the minimum description length principle and a exchange-ability condition leads directly to the use of Jeffreys prior. This approach works in most cases even when Jeffreys prior cannot be normalized. Kraft's inequality links codes and distributions but a closer look at this inequality demonstrates that this link only makes sense when sequences are considered as prefixes of potential longer sequences. For technical reasons only results for exponential families are stated. Results on when Jeffreys prior can be normalized after conditioning on a initializing string are given. An exotic case where no initial string allow Jeffreys prior to be normalized is given and some way of handling such exotic cases are discussed.
△ Less
Submitted 19 May, 2013; v1 submitted 28 January, 2013;
originally announced January 2013.
-
Minimum KL-divergence on complements of $L_1$ balls
Authors:
Daniel Berend,
Peter Harremoës,
Aryeh Kontorovich
Abstract:
Pinsker's widely used inequality upper-bounds the total variation distance $||P-Q||_1$ in terms of the Kullback-Leibler divergence $D(P||Q)$. Although in general a bound in the reverse direction is impossible, in many applications the quantity of interest is actually $D^*(P,\eps)$ --- defined, for an arbitrary fixed $P$, as the infimum of $D(P||Q)$ over all distributions $Q$ that are $\eps$-far aw…
▽ More
Pinsker's widely used inequality upper-bounds the total variation distance $||P-Q||_1$ in terms of the Kullback-Leibler divergence $D(P||Q)$. Although in general a bound in the reverse direction is impossible, in many applications the quantity of interest is actually $D^*(P,\eps)$ --- defined, for an arbitrary fixed $P$, as the infimum of $D(P||Q)$ over all distributions $Q$ that are $\eps$-far away from $P$ in total variation. We show that $D^*(P,\eps)\le C\eps^2 + O(\eps^3)$, where $C=C(P)=1/2$ for "balanced" distributions, thereby providing a kind of reverse Pinsker inequality. An application to large deviations is given, and some of the structural results may be of independent interest. Keywords: Pinsker inequality, Sanov's theorem, large deviations
△ Less
Submitted 20 February, 2014; v1 submitted 27 June, 2012;
originally announced June 2012.
-
Rényi Divergence and Kullback-Leibler Divergence
Authors:
Tim van Erven,
Peter Harremoës
Abstract:
Rényi divergence is related to Rényi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Rényi divergence of order 1 equals the Kullback-Leibler…
▽ More
Rényi divergence is related to Rényi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Rényi divergence of order 1 equals the Kullback-Leibler divergence.
We review and extend the most important properties of Rényi divergence and Kullback-Leibler divergence, including convexity, continuity, limits of $σ$-algebras and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also show how to generalize the Pythagorean inequality to orders different from 1, and we extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders) and present several other minimax results.
△ Less
Submitted 24 April, 2014; v1 submitted 12 June, 2012;
originally announced June 2012.
-
Some Refinements of Large Deviation Tail Probabilities
Authors:
Laszlo Gyorfi,
Peter Harremoes,
Gabor Tusnady
Abstract:
We study tail probabilities via some Gaussian approximations. Our results make refinements to large deviation theory. The proof builds on classical results by Bahadur and Rao. Binomial distributions and their tail probabilities are discussed in more detail.
We study tail probabilities via some Gaussian approximations. Our results make refinements to large deviation theory. The proof builds on classical results by Bahadur and Rao. Binomial distributions and their tail probabilities are discussed in more detail.
△ Less
Submitted 4 May, 2012;
originally announced May 2012.
-
Information Divergence is more chi squared distributed than the chi squared statistics
Authors:
Peter Harremoës,
Gábor Tusnády
Abstract:
For testing goodness of fit it is very popular to use either the chi square statistic or G statistics (information divergence). Asymptotically both are chi square distributed so an obvious question is which of the two statistics that has a distribution that is closest to the chi square distribution. Surprisingly, when there is only one degree of freedom it seems like the distribution of informatio…
▽ More
For testing goodness of fit it is very popular to use either the chi square statistic or G statistics (information divergence). Asymptotically both are chi square distributed so an obvious question is which of the two statistics that has a distribution that is closest to the chi square distribution. Surprisingly, when there is only one degree of freedom it seems like the distribution of information divergence is much better approximated by a chi square distribution than the chi square statistic. For random variables we introduce a new transformation that transform several important distributions into new random variables that are almost Gaussian. For the binomial distributions and the Poisson distributions we formulate a general conjecture about how close their transform are to the Gaussian. The conjecture is proved for Poisson distributions.
△ Less
Submitted 17 June, 2012; v1 submitted 6 February, 2012;
originally announced February 2012.
-
Lower bounds on Information Divergence
Authors:
Peter Harremoës,
Christophe Vignat
Abstract:
In this paper we establish lower bounds on information divergence from a distribution to certain important classes of distributions as Gaussian, exponential, Gamma, Poisson, geometric, and binomial. These lower bounds are tight and for several convergence theorems where a rate of convergence can be computed, this rate is determined by the lower bounds proved in this paper. General techniques for g…
▽ More
In this paper we establish lower bounds on information divergence from a distribution to certain important classes of distributions as Gaussian, exponential, Gamma, Poisson, geometric, and binomial. These lower bounds are tight and for several convergence theorems where a rate of convergence can be computed, this rate is determined by the lower bounds proved in this paper. General techniques for getting lower bounds in terms of moments are developed.
△ Less
Submitted 12 February, 2011;
originally announced February 2011.
-
Is Zero a Natural Number?
Authors:
Peter Harremoës
Abstract:
It is argued that zero should be considered as a cardinal number but not an ordinal number. One should make a clear distinction between order types that are labels for well-ordered sets and ordinal numbers that are labels for the elements in these sets.
It is argued that zero should be considered as a cardinal number but not an ordinal number. One should make a clear distinction between order types that are labels for well-ordered sets and ordinal numbers that are labels for the elements in these sets.
△ Less
Submitted 2 February, 2011;
originally announced February 2011.
-
On Pairs of $f$-divergences and their Joint Range
Authors:
Peter Harremoës,
Igor Vajda
Abstract:
We compare two f-divergences and prove that their joint range is the convex hull of the joint range for distributions supported on only two points. Some applications of this result are given.
We compare two f-divergences and prove that their joint range is the convex hull of the joint range for distributions supported on only two points. Some applications of this result are given.
△ Less
Submitted 1 July, 2010;
originally announced July 2010.
-
On Bahadur Efficiency of Power Divergence Statistics
Authors:
Peter Harremoës,
Igor Vajda
Abstract:
It is proved that the information divergence statistic is infinitely more Bahadur efficient than the power divergence statistics of the orders $α>1$ as long as the sequence of alternatives is contiguous with respect to the sequence of null-hypotheses and the the number of observations per bin increases to infinity is not very slow. This improves the former result in Harremoës and Vajda (2008) wh…
▽ More
It is proved that the information divergence statistic is infinitely more Bahadur efficient than the power divergence statistics of the orders $α>1$ as long as the sequence of alternatives is contiguous with respect to the sequence of null-hypotheses and the the number of observations per bin increases to infinity is not very slow. This improves the former result in Harremoës and Vajda (2008) where the the sequence of null-hypotheses was assumed to be uniform and the restrictions on on the numbers of observations per bin were sharper. Moreover, this paper evaluates also the Bahadur efficiency of the power divergence statistics of the remaining positive orders $0< α\leq 1.$ The statistics of these orders are mutually Bahadur-comparable and all of them are more Bahadur efficient than the statistics of the orders $α> 1.$ A detailed discussion of the technical definitions and conditions is given, some unclear points are resolved, and the results are illustrated by examples.
△ Less
Submitted 7 February, 2010;
originally announced February 2010.
-
Rényi Divergence and Majorization
Authors:
Tim van Erven,
Peter Harremoës
Abstract:
Rényi divergence is related to Rényi entropy much like information divergence (also called Kullback-Leibler divergence or relative entropy) is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as information divergence. We review the most important properties of Rényi divergence, including its r…
▽ More
Rényi divergence is related to Rényi entropy much like information divergence (also called Kullback-Leibler divergence or relative entropy) is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as information divergence. We review the most important properties of Rényi divergence, including its relation to some other distances. We show how Rényi divergence appears when the theory of majorization is generalized from the finite to the continuous setting. Finally, Rényi divergence plays a role in analyzing the number of binary questions required to guess the values of a sequence of random variables.
△ Less
Submitted 27 May, 2010; v1 submitted 25 January, 2010;
originally announced January 2010.
-
Joint Range of f-divergences
Authors:
Peter Harremoës,
Igor Vajda
Abstract:
We provide a general method for evaluation of the joint range of f-divergences for two different functions f. Via topological arguments we prove that the joint range for general distributions equals the convex hull of the joint range achieved by the distributions on a two-element set. The joint range technique provides important inequalities between different f-divergences with various application…
▽ More
We provide a general method for evaluation of the joint range of f-divergences for two different functions f. Via topological arguments we prove that the joint range for general distributions equals the convex hull of the joint range achieved by the distributions on a two-element set. The joint range technique provides important inequalities between different f-divergences with various applications in information theory and statistics.
△ Less
Submitted 27 May, 2010; v1 submitted 25 January, 2010;
originally announced January 2010.
-
Thinning, Entropy and the Law of Thin Numbers
Authors:
Peter Harremoes,
Oliver Johnson,
Ioannis Kontoyiannis
Abstract:
Renyi's "thinning" operation on a discrete random variable is a natural discrete analog of the scaling operation for continuous random variables. The properties of thinning are investigated in an information-theoretic context, especially in connection with information-theoretic inequalities related to Poisson approximation results. The classical Binomial-to-Poisson convergence (sometimes referre…
▽ More
Renyi's "thinning" operation on a discrete random variable is a natural discrete analog of the scaling operation for continuous random variables. The properties of thinning are investigated in an information-theoretic context, especially in connection with information-theoretic inequalities related to Poisson approximation results. The classical Binomial-to-Poisson convergence (sometimes referred to as the "law of small numbers" is seen to be a special case of a thinning limit theorem for convolutions of discrete distributions. A rate of convergence is provided for this limit, and nonasymptotic bounds are also established. This development parallels, in part, the development of Gaussian inequalities leading to the information-theoretic version of the central limit theorem. In particular, a "thinning Markov chain" is introduced, and it is shown to play a role analogous to that of the Ornstein-Uhlenbeck process in connection to the entropy power inequality.
△ Less
Submitted 3 June, 2009;
originally announced June 2009.
-
Joint Range of Rényi Entropies
Authors:
Peter Harremoës
Abstract:
The exact range of the joined values of several Rényi entropies is determined. The method is based on topology with special emphasis on the orientation of the objects studied. Like in the case when only two orders of Rényi entropies are studied one can parametrize upper and lower bounds but an explicit formula for a tight upper or lower bound cannot be given.
The exact range of the joined values of several Rényi entropies is determined. The method is based on topology with special emphasis on the orientation of the objects studied. Like in the case when only two orders of Rényi entropies are studied one can parametrize upper and lower bounds but an explicit formula for a tight upper or lower bound cannot be given.
△ Less
Submitted 16 April, 2009;
originally announced April 2009.
-
Dutch Books and Combinatorial Games
Authors:
Peter Harremoes
Abstract:
The theory of combinatorial game (like board games) and the theory of social games (where one looks for Nash equilibria) are normally considered as two separate theories. Here we shall see what comes out of combining the ideas. The central idea is Conway's observation that real numbers can be interpreted as special types of combinatorial games. Therefore the payoff function of a social game is a c…
▽ More
The theory of combinatorial game (like board games) and the theory of social games (where one looks for Nash equilibria) are normally considered as two separate theories. Here we shall see what comes out of combining the ideas. The central idea is Conway's observation that real numbers can be interpreted as special types of combinatorial games. Therefore the payoff function of a social game is a combinatorial game. Probability theory should be considered as a safety net that prevents inconsistent decisions via the Dutch Book Argument. This result can be extended to situations where the payoff function is a more general game than a real number. The main difference between number valued payoff and game valued payoff is that a probability distribution that gives non-negative mean payoff does not ensure that the game will be lost due to the existence of infinitisimal games. Also the Ramsay/de Finetti theorem on exchangable sequences is discussed.
△ Less
Submitted 27 May, 2010; v1 submitted 31 March, 2009;
originally announced March 2009.
-
Testing Goodness-of-Fit via Rate Distortion
Authors:
Peter Harremoes
Abstract:
A framework is developed using techniques from rate distortion theory in statistical testing. The idea is first to do optimal compression according to a certain distortion function and then use information divergence from the compressed empirical distribution to the compressed null hypothesis as statistic. Only very special cases have been studied in more detail, but they indicate that the appro…
▽ More
A framework is developed using techniques from rate distortion theory in statistical testing. The idea is first to do optimal compression according to a certain distortion function and then use information divergence from the compressed empirical distribution to the compressed null hypothesis as statistic. Only very special cases have been studied in more detail, but they indicate that the approach can be used under very general conditions.
△ Less
Submitted 31 March, 2009;
originally announced March 2009.
-
Regret and Jeffreys Integrals in Exp. Families
Authors:
Peter Grunwald,
Peter Harremoes
Abstract:
The problem of whether minimax redundancy, minimax regret and Jeffreys integrals are finite or infinite are discussed.
The problem of whether minimax redundancy, minimax regret and Jeffreys integrals are finite or infinite are discussed.
△ Less
Submitted 31 March, 2009;
originally announced March 2009.
-
Maximum Entropy on Compact Groups
Authors:
Peter Harremoes
Abstract:
On a compact group the Haar probability measure plays the role of uniform distribution. The entropy and rate distortion theory for this uniform distribution is studied. New results and simplified proofs on convergence of convolutions on compact groups are presented and they can be formulated as entropy increases to its maximum. Information theoretic techniques and Markov chains play a crucial ro…
▽ More
On a compact group the Haar probability measure plays the role of uniform distribution. The entropy and rate distortion theory for this uniform distribution is studied. New results and simplified proofs on convergence of convolutions on compact groups are presented and they can be formulated as entropy increases to its maximum. Information theoretic techniques and Markov chains play a crucial role. The convergence results are also formulated via rate distortion functions. The rate of convergence is shown to be exponential.
△ Less
Submitted 29 March, 2009; v1 submitted 30 December, 2008;
originally announced January 2009.
-
Properties of Classical and Quantum Jensen-Shannon Divergence
Authors:
Jop Briët,
Peter Harremoës
Abstract:
Jensen-Shannon divergence (JD) is a symmetrized and smoothed version of the most important divergence measure of information theory, Kullback divergence. As opposed to Kullback divergence it determines in a very direct way a metric; indeed, it is the square of a metric. We consider a family of divergence measures (JD_alpha for alpha>0), the Jensen divergences of order alpha, which generalize JD…
▽ More
Jensen-Shannon divergence (JD) is a symmetrized and smoothed version of the most important divergence measure of information theory, Kullback divergence. As opposed to Kullback divergence it determines in a very direct way a metric; indeed, it is the square of a metric. We consider a family of divergence measures (JD_alpha for alpha>0), the Jensen divergences of order alpha, which generalize JD as JD_1=JD. Using a result of Schoenberg, we prove that JD_alpha is the square of a metric for alpha lies in the interval (0,2], and that the resulting metric space of probability distributions can be isometrically embedded in a real Hilbert space. Quantum Jensen-Shannon divergence (QJD) is a symmetrized and smoothed version of quantum relative entropy and can be extended to a family of quantum Jensen divergences of order alpha (QJD_alpha). We strengthen results by Lamberti et al. by proving that for qubits and pure states, QJD_alpha^1/2 is a metric space which can be isometrically embedded in a real Hilbert space when alpha lies in the interval (0,2]. In analogy with Burbea and Rao's generalization of JD, we also define general QJD by associating a Jensen-type quantity to any weighted family of states. Appropriate interpretations of quantities introduced are discussed and bounds are derived in terms of the total variation and trace distance.
△ Less
Submitted 14 April, 2009; v1 submitted 27 June, 2008;
originally announced June 2008.
-
Interpretations of Renyi Entropies And Divergences
Authors:
Peter Harremoes
Abstract:
In this paper a new operational definition of Renyi entropy and Renyi divergence is presented. Other operational definitions are mentioned.
In this paper a new operational definition of Renyi entropy and Renyi divergence is presented. Other operational definitions are mentioned.
△ Less
Submitted 30 September, 2005;
originally announced October 2005.
-
Entropy and the Law of Small Numbers
Authors:
Ioannis Kontoyiannis,
Peter Harremoes,
Oliver Johnson
Abstract:
Two new information-theoretic methods are introduced for establishing Poisson approximation inequalities. First, using only elementary information-theoretic techniques it is shown that, when $S_n=\sum_{i=1}^nX_i$ is the sum of the (possibly dependent) binary random variables $X_1,X_2,...,X_n$, with $E(X_i)=p_i$ and $E(S_n)=\la$, then \ben D(P_{S_n}\|\Pol)\leq \sum_{i=1}^n p_i^2 + \Big[\sum_{i=1}…
▽ More
Two new information-theoretic methods are introduced for establishing Poisson approximation inequalities. First, using only elementary information-theoretic techniques it is shown that, when $S_n=\sum_{i=1}^nX_i$ is the sum of the (possibly dependent) binary random variables $X_1,X_2,...,X_n$, with $E(X_i)=p_i$ and $E(S_n)=\la$, then \ben D(P_{S_n}\|\Pol)\leq \sum_{i=1}^n p_i^2 + \Big[\sum_{i=1}^nH(X_i) - H(X_1,X_2,..., X_n)\Big], \een where $D(P_{S_n}\|{Po}(\la))$ is the relative entropy between the distribution of $S_n$ and the Poisson($\la$) distribution. The first term in this bound measures the individual smallness of the $X_i$ and the second term measures their dependence. A general method is outlined for obtaining corresponding bounds when approximating the distribution of a sum of general discrete random variables by an infinitely divisible distribution.
Second, in the particular case when the $X_i$ are independent, the following sharper bound is established, \ben D(P_{S_n}\|\Pol)\leq \frac{1}λ \sum_{i=1}^n \frac{p_i^3}{1-p_i}, % \label{eq:abs2} \een and it is also generalized to the case when the $X_i$ are general integer-valued random variables. Its proof is based on the derivation of a subadditivity property for a new discrete version of the Fisher information, and uses a recent logarithmic Sobolev inequality for the Poisson distribution.
△ Less
Submitted 17 November, 2004; v1 submitted 1 November, 2002;
originally announced November 2002.