Search | arXiv e-print repository

Universal Reverse Information Projections and Optimal E-statistics

Authors: Tyron Lardy, Peter Grünwald, Peter Harremoës

Abstract: Information projections have found important applications in probability theory, statistics, and related areas. In the field of hypothesis testing in particular, the reverse information projection (RIPr) has recently been shown to lead to so-called growth-rate optimal (GRO) e-statistics for testing simple alternatives against composite null hypotheses. However, the RIPr as well as the GRO criterio… ▽ More Information projections have found important applications in probability theory, statistics, and related areas. In the field of hypothesis testing in particular, the reverse information projection (RIPr) has recently been shown to lead to so-called growth-rate optimal (GRO) e-statistics for testing simple alternatives against composite null hypotheses. However, the RIPr as well as the GRO criterion are undefined whenever the infimum information divergence between the null and alternative is infinite. We show that in such scenarios there often still exists an element in the alternative that is 'closest' to the null: the universal reverse information projection. The universal reverse information projection and its non-universal counterpart coincide whenever information divergence is finite. Furthermore, the universal RIPr is shown to lead to optimal e-statistics in a sense that is a novel, but natural, extension of the GRO criterion. We also give conditions under which the universal RIPr is a strict sub-probability distribution, as well as conditions under which an approximation of the universal RIPr leads to approximate e-statistics. For this case we provide tight relations between the corresponding approximation rates. △ Less

Submitted 4 December, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: A five-page abstract of this paper, containing a subset of the theorems but no proofs, was presented at ISIT 2023, Taipei

MSC Class: 62B10 (primary); 94A17 (secondary)

arXiv:2202.02668 [pdf, other]

Unnormalized Measures in Information Theory

Authors: Peter Harremoës

Abstract: Information theory is built on probability measures and by definition a probability measure has total mass 1. Probability measures are used to model uncertainty, and one may ask how important it is that the total mass is one. We claim that the main reason to normalize measures is that probability measures are related to codes via Kraft's inequality. Using a minimum description length approach to s… ▽ More Information theory is built on probability measures and by definition a probability measure has total mass 1. Probability measures are used to model uncertainty, and one may ask how important it is that the total mass is one. We claim that the main reason to normalize measures is that probability measures are related to codes via Kraft's inequality. Using a minimum description length approach to statistics we will demonstrate with that measures that are not normalized require a new interpretation that we will call the Poisson interpretation. With the Poisson interpretation many problems can be simplified. The focus will shift from from probabilities to mean values. We give examples of improvements of test procedures, improved inequalities, simplified algorithms, new projection results, and improvements in our description of quantum systems. △ Less

Submitted 5 February, 2022; originally announced February 2022.

Comments: 6 pages, 3 figures

MSC Class: 94A17

arXiv:2201.03707 [pdf, other]

doi 10.3390/e25030456

Rate Distortion Theory for Descriptive Statistics

Authors: Peter Harremoës

Abstract: Rate distortion theory was developed for optimizing lossy compression of data, but it also has a lot of applications in statistics. In this paper we will see how rate distortion theory can be used to analyze a complicated data set involving orientations of early Islamic mosques. The analysis involves testing, identification of outliers, choice of compression rate, calculation of optimal reconstruc… ▽ More Rate distortion theory was developed for optimizing lossy compression of data, but it also has a lot of applications in statistics. In this paper we will see how rate distortion theory can be used to analyze a complicated data set involving orientations of early Islamic mosques. The analysis involves testing, identification of outliers, choice of compression rate, calculation of optimal reconstruction points, and assigning "descriptive confidence regions" to the reconstruction points. In this paper the focus will be on the methods, so the integrity of the data set and the interpretation of the results will not be discussed. △ Less

Submitted 16 February, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

Comments: 6 pages, 4 figures

MSC Class: 94-10; 94A34

arXiv:2002.03002 [pdf, other]

Bounds on the Information Divergence for Hypergeometric Distributions

Authors: Peter Harremoës, František Matúš

Abstract: The hypergeometric distributions have many important applications, but they have not had sufficient attention in information theory. Hypergeometric distributions can be approximated by binomial distributions or Poisson distributions. In this paper we present upper and lower bounds on information divergence. These bounds are important for statistical testing and a better understanding of the notion… ▽ More The hypergeometric distributions have many important applications, but they have not had sufficient attention in information theory. Hypergeometric distributions can be approximated by binomial distributions or Poisson distributions. In this paper we present upper and lower bounds on information divergence. These bounds are important for statistical testing and a better understanding of the notion of exchange-ability. △ Less

Submitted 7 February, 2020; originally announced February 2020.

Comments: 21 pages, 2 figures

MSC Class: 60E15 94A17

arXiv:2002.02895 [pdf, ps, other]

doi 10.1007/s40509-020-00222-w

From Thermodynamic Sufficiency to Information Causality

Authors: Peter Harremoës

Abstract: The principle called information causality has been used to deduce Tsirelson's bound. In this paper we derive information causality from monotonicity of divergence and relate it to more basic principles related to measurements on thermodynamic systems. This principle is more fundamental in the sense that it can be formulated for both unipartite systems and multipartite systems while information ca… ▽ More The principle called information causality has been used to deduce Tsirelson's bound. In this paper we derive information causality from monotonicity of divergence and relate it to more basic principles related to measurements on thermodynamic systems. This principle is more fundamental in the sense that it can be formulated for both unipartite systems and multipartite systems while information causality is only defined for multipartite systems. Thermodynamic sufficiency is a strong condition that put severe restrictions to shape of the state space to an extend that we conjecture that under very weak regularity conditions it can be used to deduce the complex Hilbert space formalism of quantum theory. Since the notion of sufficiency is relevant for all convex optimization problems there are many examples where it does not apply. △ Less

Submitted 7 February, 2020; originally announced February 2020.

Comments: 11 pages

MSC Class: 81P16

arXiv:1805.02234 [pdf, ps, other]

Statistical Inference and Exact Saddle Point Approximations

Authors: Peter Harremoës

Abstract: Statistical inference may follow a frequentist approach or it may follow a Bayesian approach or it may use the minimum description length principle (MDL). Our goal is to identify situations in which these different approaches to statistical inference coincide. It is proved that for exponential families MDL and Bayesian inference coincide if and only if the renormalized saddle point approximation f… ▽ More Statistical inference may follow a frequentist approach or it may follow a Bayesian approach or it may use the minimum description length principle (MDL). Our goal is to identify situations in which these different approaches to statistical inference coincide. It is proved that for exponential families MDL and Bayesian inference coincide if and only if the renormalized saddle point approximation for the conjugated exponential family is exact. For 1-dimensional exponential families the only families with exact renormalized saddle point approximations are the Gaussian location family, the Gamma family and the inverse Gaussian family. They are conjugated families of the Gaussian location family, the Gamma family and the Poisson-exponential family. The first two families are self-conjugated implying that only for the two first families the Bayesian approach is consistent with the frequentist approach. In higher dimensions there are more examples. △ Less

Submitted 6 May, 2018; originally announced May 2018.

Comments: 5 pages

MSC Class: 62B10;

arXiv:1707.03222 [pdf, ps, other]

Entropy on Spin Factors

Authors: Peter Harremoës

Abstract: Recently it has been demonstrated that the Shannon entropy or the von Neuman entropy are the only entropy functions that generate a local Bregman divergences as long as the state space has rank 3 or higher. In this paper we will study the properties of Bregman divergences for convex bodies of rank 2. The two most important convex bodies of rank 2 can be identified with the bit and the qubit. We de… ▽ More Recently it has been demonstrated that the Shannon entropy or the von Neuman entropy are the only entropy functions that generate a local Bregman divergences as long as the state space has rank 3 or higher. In this paper we will study the properties of Bregman divergences for convex bodies of rank 2. The two most important convex bodies of rank 2 can be identified with the bit and the qubit. We demonstrate that if a convex body of rank 2 has a Bregman divergence that satisfies sufficiency then the convex body is spectral and if the Bregman divergence is monotone then the convex body has the shape of a ball. A ball can be represented as the state space of a spin factor, which is the most simple type of Jordan algebra. We also study the existence of recovery maps for Bregman divergences on spin factors. In general the convex bodies of rank 2 appear as faces of state spaces of higher rank. Therefore our results give strong restrictions on which convex bodies could be the state space of a physical system with a well-behaved entropy function. △ Less

Submitted 4 May, 2018; v1 submitted 11 July, 2017; originally announced July 2017.

Comments: 30 pages, 6 figures

MSC Class: 81P16

arXiv:1701.06688 [pdf, ps, other]

Quantum Information on Spectral Sets

Authors: Peter Harremoës

Abstract: For convex optimization problems Bregman divergences appear as regret functions. Such regret functions can be defined on any convex set but if a sufficiency condition is added the regret function must be proportional to information divergence and the convex set must be spectral. Spectral set are sets where different orthogonal decompositions of a state into pure states have unique mixing coefficie… ▽ More For convex optimization problems Bregman divergences appear as regret functions. Such regret functions can be defined on any convex set but if a sufficiency condition is added the regret function must be proportional to information divergence and the convex set must be spectral. Spectral set are sets where different orthogonal decompositions of a state into pure states have unique mixing coefficients. Only on such spectral sets it is possible to define well behaved information theoretic quantities like entropy and divergence. It is only possible to perform measurements in a reversible way if the state space is spectral. The most important spectral sets can be represented as positive elements of Jordan algebras with trace 1. This means that Jordan algebras provide a natural framework for studying quantum information. We compare information theory on Hilbert spaces with information theory in more general Jordan algebras, and conclude that much of the formalism is unchanged but also identify some important differences. △ Less

Submitted 10 February, 2017; v1 submitted 23 January, 2017; originally announced January 2017.

Comments: 13 pages, 2 figures. arXiv admin note: text overlap with arXiv:1701.01010

MSC Class: 81P16; 94B75

arXiv:1701.01010 [pdf, other]

doi 10.3390/e19050206

Divergence and Sufficiency for Convex Optimization

Authors: Peter Harremoës

Abstract: Logarithmic score and information divergence appear in information theory, statistics, statistical mechanics, and portfolio theory. We demonstrate that all these topics involve some kind of optimization that leads directly to regret functions and such regret functions are often given by a Bregman divergence. If the regret function also fulfills a sufficiency condition it must be proportional to in… ▽ More Logarithmic score and information divergence appear in information theory, statistics, statistical mechanics, and portfolio theory. We demonstrate that all these topics involve some kind of optimization that leads directly to regret functions and such regret functions are often given by a Bregman divergence. If the regret function also fulfills a sufficiency condition it must be proportional to information divergence. We will demonstrate that sufficiency is equivalent to the apparently weaker notion of locality and it is also equivalent to the apparently stronger notion of monotonicity. These sufficiency conditions have quite different relevance in the different areas of application, and often they are not fulfilled. Therefore sufficiency conditions can be used to explain when results from one area can be transferred directly to another and when one will experience differences. △ Less

Submitted 10 April, 2017; v1 submitted 4 January, 2017; originally announced January 2017.

Comments: 39 pages, 3 figures

MSC Class: 94A17

arXiv:1607.02259 [pdf, ps, other]

doi 10.3390/e19050206

Maximum Entropy and Sufficiency

Authors: Peter Harremoës

Abstract: The notion of Bregman divergence and sufficiency will be defined on general convex state spaces. It is demonstrated that only spectral sets can have a Bregman divergence that satisfies a sufficiency condition. Positive elements with trace 1 in a Jordan algebra are examples of spectral sets, and the most important example is the set of density matrices with complex entries. It is conjectured that i… ▽ More The notion of Bregman divergence and sufficiency will be defined on general convex state spaces. It is demonstrated that only spectral sets can have a Bregman divergence that satisfies a sufficiency condition. Positive elements with trace 1 in a Jordan algebra are examples of spectral sets, and the most important example is the set of density matrices with complex entries. It is conjectured that information theoretic considerations lead directly to the notion of Jordan algebra under some regularity conditions. △ Less

Submitted 3 September, 2016; v1 submitted 8 July, 2016; originally announced July 2016.

MSC Class: 81P16; 94A17

arXiv:1601.07593 [pdf, ps, other]

Sufficiency on the Stock Market

Authors: Peter Harremoës

Abstract: It is well-known that there are a number of relations between theoretical finance theory and information theory. Some of these relations are exact and some are approximate. In this paper we will explore some of these relations and determine under which conditions the relations are exact. It turns out that portfolio theory always leads to Bregman divergences. The Bregman divergence is only proporti… ▽ More It is well-known that there are a number of relations between theoretical finance theory and information theory. Some of these relations are exact and some are approximate. In this paper we will explore some of these relations and determine under which conditions the relations are exact. It turns out that portfolio theory always leads to Bregman divergences. The Bregman divergence is only proportional to information divergence in situations that are essentially equal to the type of gambling studied by Kelly. This can be related an abstract sufficiency condition. △ Less

Submitted 27 January, 2016; originally announced January 2016.

MSC Class: 91B25

arXiv:1601.05179 [pdf, other]

doi 10.14736/kyb-2016-6-0943

Bounds on Tail Probabilities in Exponential families

Authors: Peter Harremoës

Abstract: In this paper we present various new inequalities for tail proabilities for distributions that are elements of the most improtant exponential families. These families include the Poisson distributions, the Gamma distributions, the binomial distributions, the negative binomial distributions and the inverse Gaussian distributions. All these exponential families have simple variance functions and the… ▽ More In this paper we present various new inequalities for tail proabilities for distributions that are elements of the most improtant exponential families. These families include the Poisson distributions, the Gamma distributions, the binomial distributions, the negative binomial distributions and the inverse Gaussian distributions. All these exponential families have simple variance functions and the variance functions play an important role in the exposition. All the inequalities presented in this paper are formulated in terms of the signed log-likelihood. The inequalities are of a qualitative nature in that they can be formulated either in terms of stochastic domination or in terms of an intersection property that states that a certain discrete distribution is very close to a certain continuous distribution. △ Less

Submitted 8 February, 2016; v1 submitted 20 January, 2016; originally announced January 2016.

Comments: 27 pages, 10 figures

MSC Class: 60E15

Journal ref: Kybernetika 51, 943-966 (2016)

arXiv:1601.04255 [pdf, ps, other]

Thinning and Information Projections

Authors: Peter Harremoës, Oliver Johnson, Ioannis Kontoyiannis

Abstract: In this paper we establish lower bounds on information divergence of a distribution on the integers from a Poisson distribution. These lower bounds are tight and in the cases where a rate of convergence in the Law of Thin Numbers can be computed the rate is determined by the lower bounds proved in this paper. General techniques for getting lower bounds in terms of moments are developed. The result… ▽ More In this paper we establish lower bounds on information divergence of a distribution on the integers from a Poisson distribution. These lower bounds are tight and in the cases where a rate of convergence in the Law of Thin Numbers can be computed the rate is determined by the lower bounds proved in this paper. General techniques for getting lower bounds in terms of moments are developed. The results about lower bound in the Law of Thin Numbers are used to derive similar results for the Central Limit Theorem. △ Less

Submitted 17 January, 2016; originally announced January 2016.

MSC Class: 60F99; 94A11

arXiv:1507.07089 [pdf, other]

Proper Scoring and Sufficiency

Authors: Peter Harremoës

Abstract: Logarithmic score and information divergence appear in both information theory, statistics, statistical mechanics, and portfolio theory. We demonstrate that all these topics involve some kind of optimization that leads directly to the use of Bregman divergences. If a sufficiency condition is also fulfilled the Bregman divergence must be proportional to information divergence. The sufficiency condi… ▽ More Logarithmic score and information divergence appear in both information theory, statistics, statistical mechanics, and portfolio theory. We demonstrate that all these topics involve some kind of optimization that leads directly to the use of Bregman divergences. If a sufficiency condition is also fulfilled the Bregman divergence must be proportional to information divergence. The sufficiency condition has quite different consequences in the different areas of application, and often it is not fulfilled. Therefore the sufficiency condition can be used to explain when results from one area can be transferred directly from one area to another and when one will experience differences. △ Less

Submitted 25 July, 2015; originally announced July 2015.

Comments: Proceedings WITMSE 2015

MSC Class: 62B10; 94A17

arXiv:1502.04336 [pdf, ps, other]

Lattices with non-Shannon Inequalities

Authors: Peter Harremoës

Abstract: We study the existence or absence of non-Shannon inequalities for variables that are related by functional dependencies. Although the power-set on four variables is the smallest Boolean lattice with non-Shannon inequalities there exist lattices with many more variables without non-Shannon inequalities. We search for conditions that ensures that no non-Shannon inequalities exist. It is demonstrated… ▽ More We study the existence or absence of non-Shannon inequalities for variables that are related by functional dependencies. Although the power-set on four variables is the smallest Boolean lattice with non-Shannon inequalities there exist lattices with many more variables without non-Shannon inequalities. We search for conditions that ensures that no non-Shannon inequalities exist. It is demonstrated that 3-dimensional distributive lattices cannot have non-Shannon inequalities and planar modular lattices cannot have non-Shannon inequalities. The existence of non-Shannon inequalities is related to the question of whether a lattice is isomorphic to a lattice of subgroups of a group. △ Less

Submitted 15 February, 2015; originally announced February 2015.

Comments: Ten pages. Submitted to ISIT 2015. The appendix will not appear in the proceedings

arXiv:1402.0092 [pdf, other]

Mutual information of Contingency Tables and Related Inequalities

Authors: Peter Harremoës

Abstract: For testing independence it is very popular to use either the $χ^{2}$-statistic or $G^{2}$-statistics (mutual information). Asymptotically both are $χ^{2}$-distributed so an obvious question is which of the two statistics that has a distribution that is closest to the $χ^{2}$-distribution. Surprisingly the distribution of mutual information is much better approximated by a $χ^{2}$-distribution tha… ▽ More For testing independence it is very popular to use either the $χ^{2}$-statistic or $G^{2}$-statistics (mutual information). Asymptotically both are $χ^{2}$-distributed so an obvious question is which of the two statistics that has a distribution that is closest to the $χ^{2}$-distribution. Surprisingly the distribution of mutual information is much better approximated by a $χ^{2}$-distribution than the $χ^{2}$-statistic. For technical reasons we shall focus on the simplest case with one degree of freedom. We introduce the signed log-likelihood and demonstrate that its distribution function can be related to the distribution function of a standard Gaussian by inequalities. For the hypergeometric distribution we formulate a general conjecture about how close the signed log-likelihood is to a standard Gaussian, and this conjecture gives much more accurate estimates of the tail probabilities of this type of distribution than previously published results. The conjecture has been proved numerically in all cases relevant for testing independence and further evidence of its validity is given. △ Less

Submitted 1 February, 2014; originally announced February 2014.

Comments: A version without the appendix has been submitted to a conference

arXiv:1305.4324 [pdf, ps, other]

Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families

Authors: Peter Bartlett, Peter Grunwald, Peter Harremoes, Fares Hedayati, Wojciech Kotlowski

Abstract: We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable, and if and only if the optimal strategy can be calculated without knowing the time horizon in advance. They… ▽ More We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable, and if and only if the optimal strategy can be calculated without knowing the time horizon in advance. They put forward the question what families have exchangeable SNML strategies. This paper fully answers this open problem for one-dimensional exponential families. The exchangeability can happen only for three classes of natural exponential family distributions, namely the Gaussian, Gamma, and the Tweedie exponential family of order 3/2. Keywords: SNML Exchangeability, Exponential Family, Online Learning, Logarithmic Loss, Bayesian Strategy, Jeffreys Prior, Fisher Information1 △ Less

Submitted 19 May, 2013; originally announced May 2013.

Comments: 23 pages

arXiv:1301.6465 [pdf, ps, other]

Extendable MDL

Authors: Peter Harremoës

Abstract: In this paper we show that combination of the minimum description length principle and a exchange-ability condition leads directly to the use of Jeffreys prior. This approach works in most cases even when Jeffreys prior cannot be normalized. Kraft's inequality links codes and distributions but a closer look at this inequality demonstrates that this link only makes sense when sequences are consider… ▽ More In this paper we show that combination of the minimum description length principle and a exchange-ability condition leads directly to the use of Jeffreys prior. This approach works in most cases even when Jeffreys prior cannot be normalized. Kraft's inequality links codes and distributions but a closer look at this inequality demonstrates that this link only makes sense when sequences are considered as prefixes of potential longer sequences. For technical reasons only results for exponential families are stated. Results on when Jeffreys prior can be normalized after conditioning on a initializing string are given. An exotic case where no initial string allow Jeffreys prior to be normalized is given and some way of handling such exotic cases are discussed. △ Less

Submitted 19 May, 2013; v1 submitted 28 January, 2013; originally announced January 2013.

Comments: 9 pages

MSC Class: 62B10; 94A15

arXiv:1206.6544 [pdf, ps, other]

Minimum KL-divergence on complements of $L_1$ balls

Authors: Daniel Berend, Peter Harremoës, Aryeh Kontorovich

Abstract: Pinsker's widely used inequality upper-bounds the total variation distance $||P-Q||_1$ in terms of the Kullback-Leibler divergence $D(P||Q)$. Although in general a bound in the reverse direction is impossible, in many applications the quantity of interest is actually $D^*(P,\eps)$ --- defined, for an arbitrary fixed $P$, as the infimum of $D(P||Q)$ over all distributions $Q$ that are $\eps$-far aw… ▽ More Pinsker's widely used inequality upper-bounds the total variation distance $||P-Q||_1$ in terms of the Kullback-Leibler divergence $D(P||Q)$. Although in general a bound in the reverse direction is impossible, in many applications the quantity of interest is actually $D^*(P,\eps)$ --- defined, for an arbitrary fixed $P$, as the infimum of $D(P||Q)$ over all distributions $Q$ that are $\eps$-far away from $P$ in total variation. We show that $D^*(P,\eps)\le C\eps^2 + O(\eps^3)$, where $C=C(P)=1/2$ for "balanced" distributions, thereby providing a kind of reverse Pinsker inequality. An application to large deviations is given, and some of the structural results may be of independent interest. Keywords: Pinsker inequality, Sanov's theorem, large deviations △ Less

Submitted 20 February, 2014; v1 submitted 27 June, 2012; originally announced June 2012.

Comments: A previous version had the title "A Reverse Pinsker Inequality"

MSC Class: 60F10; 94A15

arXiv:1206.2459 [pdf, other]

doi 10.1109/TIT.2014.2320500

Rényi Divergence and Kullback-Leibler Divergence

Authors: Tim van Erven, Peter Harremoës

Abstract: Rényi divergence is related to Rényi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Rényi divergence of order 1 equals the Kullback-Leibler… ▽ More Rényi divergence is related to Rényi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Rényi divergence of order 1 equals the Kullback-Leibler divergence. We review and extend the most important properties of Rényi divergence and Kullback-Leibler divergence, including convexity, continuity, limits of $σ$-algebras and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also show how to generalize the Pythagorean inequality to orders different from 1, and we extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders) and present several other minimax results. △ Less

Submitted 24 April, 2014; v1 submitted 12 June, 2012; originally announced June 2012.

Comments: To appear in IEEE Transactions on Information Theory

arXiv:1205.1005 [pdf, ps, other]

Some Refinements of Large Deviation Tail Probabilities

Authors: Laszlo Gyorfi, Peter Harremoes, Gabor Tusnady

Abstract: We study tail probabilities via some Gaussian approximations. Our results make refinements to large deviation theory. The proof builds on classical results by Bahadur and Rao. Binomial distributions and their tail probabilities are discussed in more detail. We study tail probabilities via some Gaussian approximations. Our results make refinements to large deviation theory. The proof builds on classical results by Bahadur and Rao. Binomial distributions and their tail probabilities are discussed in more detail. △ Less

Submitted 4 May, 2012; originally announced May 2012.

Comments: 7 pages

MSC Class: 60F10; 60E15

arXiv:1202.1125 [pdf, ps, other]

Information Divergence is more chi squared distributed than the chi squared statistics

Authors: Peter Harremoës, Gábor Tusnády

Abstract: For testing goodness of fit it is very popular to use either the chi square statistic or G statistics (information divergence). Asymptotically both are chi square distributed so an obvious question is which of the two statistics that has a distribution that is closest to the chi square distribution. Surprisingly, when there is only one degree of freedom it seems like the distribution of informatio… ▽ More For testing goodness of fit it is very popular to use either the chi square statistic or G statistics (information divergence). Asymptotically both are chi square distributed so an obvious question is which of the two statistics that has a distribution that is closest to the chi square distribution. Surprisingly, when there is only one degree of freedom it seems like the distribution of information divergence is much better approximated by a chi square distribution than the chi square statistic. For random variables we introduce a new transformation that transform several important distributions into new random variables that are almost Gaussian. For the binomial distributions and the Poisson distributions we formulate a general conjecture about how close their transform are to the Gaussian. The conjecture is proved for Poisson distributions. △ Less

Submitted 17 June, 2012; v1 submitted 6 February, 2012; originally announced February 2012.

Comments: 5 pages, accepted for presentation at ISIT 2012

MSC Class: 62E15

arXiv:1102.2536 [pdf, ps, other]

Lower bounds on Information Divergence

Authors: Peter Harremoës, Christophe Vignat

Abstract: In this paper we establish lower bounds on information divergence from a distribution to certain important classes of distributions as Gaussian, exponential, Gamma, Poisson, geometric, and binomial. These lower bounds are tight and for several convergence theorems where a rate of convergence can be computed, this rate is determined by the lower bounds proved in this paper. General techniques for g… ▽ More In this paper we establish lower bounds on information divergence from a distribution to certain important classes of distributions as Gaussian, exponential, Gamma, Poisson, geometric, and binomial. These lower bounds are tight and for several convergence theorems where a rate of convergence can be computed, this rate is determined by the lower bounds proved in this paper. General techniques for getting lower bounds in terms of moments are developed. △ Less

Submitted 12 February, 2011; originally announced February 2011.

Comments: Submitted for the conference ISIT 2011

MSC Class: 94A15

arXiv:1102.0418 [pdf, ps, other]

Is Zero a Natural Number?

Authors: Peter Harremoës

Abstract: It is argued that zero should be considered as a cardinal number but not an ordinal number. One should make a clear distinction between order types that are labels for well-ordered sets and ordinal numbers that are labels for the elements in these sets. It is argued that zero should be considered as a cardinal number but not an ordinal number. One should make a clear distinction between order types that are labels for well-ordered sets and ordinal numbers that are labels for the elements in these sets. △ Less

Submitted 2 February, 2011; originally announced February 2011.

MSC Class: 03E10

arXiv:1007.0097 [pdf, ps, other]

On Pairs of $f$-divergences and their Joint Range

Authors: Peter Harremoës, Igor Vajda

Abstract: We compare two f-divergences and prove that their joint range is the convex hull of the joint range for distributions supported on only two points. Some applications of this result are given. We compare two f-divergences and prove that their joint range is the convex hull of the joint range for distributions supported on only two points. Some applications of this result are given. △ Less

Submitted 1 July, 2010; originally announced July 2010.

Comments: 7 pages, 4 figures

MSC Class: 94A17; 26Dxx

arXiv:1002.1493 [pdf, ps, other]

On Bahadur Efficiency of Power Divergence Statistics

Authors: Peter Harremoës, Igor Vajda

Abstract: It is proved that the information divergence statistic is infinitely more Bahadur efficient than the power divergence statistics of the orders $α>1$ as long as the sequence of alternatives is contiguous with respect to the sequence of null-hypotheses and the the number of observations per bin increases to infinity is not very slow. This improves the former result in Harremoës and Vajda (2008) wh… ▽ More It is proved that the information divergence statistic is infinitely more Bahadur efficient than the power divergence statistics of the orders $α>1$ as long as the sequence of alternatives is contiguous with respect to the sequence of null-hypotheses and the the number of observations per bin increases to infinity is not very slow. This improves the former result in Harremoës and Vajda (2008) where the the sequence of null-hypotheses was assumed to be uniform and the restrictions on on the numbers of observations per bin were sharper. Moreover, this paper evaluates also the Bahadur efficiency of the power divergence statistics of the remaining positive orders $0< α\leq 1.$ The statistics of these orders are mutually Bahadur-comparable and all of them are more Bahadur efficient than the statistics of the orders $α> 1.$ A detailed discussion of the technical definitions and conditions is given, some unclear points are resolved, and the results are illustrated by examples. △ Less

Submitted 7 February, 2010; originally announced February 2010.

arXiv:1001.4448 [pdf, ps, other]

Rényi Divergence and Majorization

Authors: Tim van Erven, Peter Harremoës

Abstract: Rényi divergence is related to Rényi entropy much like information divergence (also called Kullback-Leibler divergence or relative entropy) is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as information divergence. We review the most important properties of Rényi divergence, including its r… ▽ More Rényi divergence is related to Rényi entropy much like information divergence (also called Kullback-Leibler divergence or relative entropy) is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as information divergence. We review the most important properties of Rényi divergence, including its relation to some other distances. We show how Rényi divergence appears when the theory of majorization is generalized from the finite to the continuous setting. Finally, Rényi divergence plays a role in analyzing the number of binary questions required to guess the values of a sequence of random variables. △ Less

Submitted 27 May, 2010; v1 submitted 25 January, 2010; originally announced January 2010.

MSC Class: 94A17

arXiv:1001.4432 [pdf, ps, other]

Joint Range of f-divergences

Authors: Peter Harremoës, Igor Vajda

Abstract: We provide a general method for evaluation of the joint range of f-divergences for two different functions f. Via topological arguments we prove that the joint range for general distributions equals the convex hull of the joint range achieved by the distributions on a two-element set. The joint range technique provides important inequalities between different f-divergences with various application… ▽ More We provide a general method for evaluation of the joint range of f-divergences for two different functions f. Via topological arguments we prove that the joint range for general distributions equals the convex hull of the joint range achieved by the distributions on a two-element set. The joint range technique provides important inequalities between different f-divergences with various applications in information theory and statistics. △ Less

Submitted 27 May, 2010; v1 submitted 25 January, 2010; originally announced January 2010.

Comments: Accepted for presentation at ISIT 2010

arXiv:0906.0690 [pdf, ps, other]

doi 10.1109/TIT.2010.2053893

Thinning, Entropy and the Law of Thin Numbers

Authors: Peter Harremoes, Oliver Johnson, Ioannis Kontoyiannis

Abstract: Renyi's "thinning" operation on a discrete random variable is a natural discrete analog of the scaling operation for continuous random variables. The properties of thinning are investigated in an information-theoretic context, especially in connection with information-theoretic inequalities related to Poisson approximation results. The classical Binomial-to-Poisson convergence (sometimes referre… ▽ More Renyi's "thinning" operation on a discrete random variable is a natural discrete analog of the scaling operation for continuous random variables. The properties of thinning are investigated in an information-theoretic context, especially in connection with information-theoretic inequalities related to Poisson approximation results. The classical Binomial-to-Poisson convergence (sometimes referred to as the "law of small numbers" is seen to be a special case of a thinning limit theorem for convolutions of discrete distributions. A rate of convergence is provided for this limit, and nonasymptotic bounds are also established. This development parallels, in part, the development of Gaussian inequalities leading to the information-theoretic version of the central limit theorem. In particular, a "thinning Markov chain" is introduced, and it is shown to play a role analogous to that of the Ornstein-Uhlenbeck process in connection to the entropy power inequality. △ Less

Submitted 3 June, 2009; originally announced June 2009.

Journal ref: IEEE Transactions on Information Theory, Vol 56/9, 2010, pages 4228-4244

arXiv:0904.2477 [pdf, other]

Joint Range of Rényi Entropies

Authors: Peter Harremoës

Abstract: The exact range of the joined values of several Rényi entropies is determined. The method is based on topology with special emphasis on the orientation of the objects studied. Like in the case when only two orders of Rényi entropies are studied one can parametrize upper and lower bounds but an explicit formula for a tight upper or lower bound cannot be given. The exact range of the joined values of several Rényi entropies is determined. The method is based on topology with special emphasis on the orientation of the objects studied. Like in the case when only two orders of Rényi entropies are studied one can parametrize upper and lower bounds but an explicit formula for a tight upper or lower bound cannot be given. △ Less

Submitted 16 April, 2009; originally announced April 2009.

MSC Class: 94A17; 62B10

arXiv:0903.5429 [pdf, ps, other]

Dutch Books and Combinatorial Games

Authors: Peter Harremoes

Abstract: The theory of combinatorial game (like board games) and the theory of social games (where one looks for Nash equilibria) are normally considered as two separate theories. Here we shall see what comes out of combining the ideas. The central idea is Conway's observation that real numbers can be interpreted as special types of combinatorial games. Therefore the payoff function of a social game is a c… ▽ More The theory of combinatorial game (like board games) and the theory of social games (where one looks for Nash equilibria) are normally considered as two separate theories. Here we shall see what comes out of combining the ideas. The central idea is Conway's observation that real numbers can be interpreted as special types of combinatorial games. Therefore the payoff function of a social game is a combinatorial game. Probability theory should be considered as a safety net that prevents inconsistent decisions via the Dutch Book Argument. This result can be extended to situations where the payoff function is a more general game than a real number. The main difference between number valued payoff and game valued payoff is that a probability distribution that gives non-negative mean payoff does not ensure that the game will be lost due to the existence of infinitisimal games. Also the Ramsay/de Finetti theorem on exchangable sequences is discussed. △ Less

Submitted 27 May, 2010; v1 submitted 31 March, 2009; originally announced March 2009.

MSC Class: 60A05; 91A46

arXiv:0903.5426 [pdf, ps, other]

Testing Goodness-of-Fit via Rate Distortion

Authors: Peter Harremoes

Abstract: A framework is developed using techniques from rate distortion theory in statistical testing. The idea is first to do optimal compression according to a certain distortion function and then use information divergence from the compressed empirical distribution to the compressed null hypothesis as statistic. Only very special cases have been studied in more detail, but they indicate that the appro… ▽ More A framework is developed using techniques from rate distortion theory in statistical testing. The idea is first to do optimal compression according to a certain distortion function and then use information divergence from the compressed empirical distribution to the compressed null hypothesis as statistic. Only very special cases have been studied in more detail, but they indicate that the approach can be used under very general conditions. △ Less

Submitted 31 March, 2009; originally announced March 2009.

MSC Class: 94A34; 62G10

arXiv:0903.5399 [pdf, ps, other]

Regret and Jeffreys Integrals in Exp. Families

Authors: Peter Grunwald, Peter Harremoes

Abstract: The problem of whether minimax redundancy, minimax regret and Jeffreys integrals are finite or infinite are discussed. The problem of whether minimax redundancy, minimax regret and Jeffreys integrals are finite or infinite are discussed. △ Less

Submitted 31 March, 2009; originally announced March 2009.

arXiv:0901.0015 [pdf, other]

doi 10.3390/e11020222

Maximum Entropy on Compact Groups

Authors: Peter Harremoes

Abstract: On a compact group the Haar probability measure plays the role of uniform distribution. The entropy and rate distortion theory for this uniform distribution is studied. New results and simplified proofs on convergence of convolutions on compact groups are presented and they can be formulated as entropy increases to its maximum. Information theoretic techniques and Markov chains play a crucial ro… ▽ More On a compact group the Haar probability measure plays the role of uniform distribution. The entropy and rate distortion theory for this uniform distribution is studied. New results and simplified proofs on convergence of convolutions on compact groups are presented and they can be formulated as entropy increases to its maximum. Information theoretic techniques and Markov chains play a crucial role. The convergence results are also formulated via rate distortion functions. The rate of convergence is shown to be exponential. △ Less

Submitted 29 March, 2009; v1 submitted 30 December, 2008; originally announced January 2009.

Journal ref: Entropy 2009, 11(2), 222-237

arXiv:0806.4472 [pdf, other]

doi 10.1103/PhysRevA.79.052311

Properties of Classical and Quantum Jensen-Shannon Divergence

Authors: Jop Briët, Peter Harremoës

Abstract: Jensen-Shannon divergence (JD) is a symmetrized and smoothed version of the most important divergence measure of information theory, Kullback divergence. As opposed to Kullback divergence it determines in a very direct way a metric; indeed, it is the square of a metric. We consider a family of divergence measures (JD_alpha for alpha>0), the Jensen divergences of order alpha, which generalize JD… ▽ More Jensen-Shannon divergence (JD) is a symmetrized and smoothed version of the most important divergence measure of information theory, Kullback divergence. As opposed to Kullback divergence it determines in a very direct way a metric; indeed, it is the square of a metric. We consider a family of divergence measures (JD_alpha for alpha>0), the Jensen divergences of order alpha, which generalize JD as JD_1=JD. Using a result of Schoenberg, we prove that JD_alpha is the square of a metric for alpha lies in the interval (0,2], and that the resulting metric space of probability distributions can be isometrically embedded in a real Hilbert space. Quantum Jensen-Shannon divergence (QJD) is a symmetrized and smoothed version of quantum relative entropy and can be extended to a family of quantum Jensen divergences of order alpha (QJD_alpha). We strengthen results by Lamberti et al. by proving that for qubits and pure states, QJD_alpha^1/2 is a metric space which can be isometrically embedded in a real Hilbert space when alpha lies in the interval (0,2]. In analogy with Burbea and Rao's generalization of JD, we also define general QJD by associating a Jensen-type quantity to any weighted family of states. Appropriate interpretations of quantities introduced are discussed and bounds are derived in terms of the total variation and trace distance. △ Less

Submitted 14 April, 2009; v1 submitted 27 June, 2008; originally announced June 2008.

Comments: 13 pages, LaTeX, expanded contents, added references and corrected typos

Journal ref: Phys. Rev. A 79, 052311 (2009)

arXiv:math-ph/0510002 [pdf]

doi 10.1016/j.physa.2006.01.012

Interpretations of Renyi Entropies And Divergences

Authors: Peter Harremoes

Abstract: In this paper a new operational definition of Renyi entropy and Renyi divergence is presented. Other operational definitions are mentioned. In this paper a new operational definition of Renyi entropy and Renyi divergence is presented. Other operational definitions are mentioned. △ Less

Submitted 30 September, 2005; originally announced October 2005.

Comments: 10 pages, 1 figure

MSC Class: 94A17; 82B99

arXiv:math/0211020 [pdf, ps, other]

doi 10.1109/TIT.2004.840861

Entropy and the Law of Small Numbers

Authors: Ioannis Kontoyiannis, Peter Harremoes, Oliver Johnson

Abstract: Two new information-theoretic methods are introduced for establishing Poisson approximation inequalities. First, using only elementary information-theoretic techniques it is shown that, when $S_n=\sum_{i=1}^nX_i$ is the sum of the (possibly dependent) binary random variables $X_1,X_2,...,X_n$, with $E(X_i)=p_i$ and $E(S_n)=\la$, then \ben D(P_{S_n}\|\Pol)\leq \sum_{i=1}^n p_i^2 + \Big[\sum_{i=1}… ▽ More Two new information-theoretic methods are introduced for establishing Poisson approximation inequalities. First, using only elementary information-theoretic techniques it is shown that, when $S_n=\sum_{i=1}^nX_i$ is the sum of the (possibly dependent) binary random variables $X_1,X_2,...,X_n$, with $E(X_i)=p_i$ and $E(S_n)=\la$, then \ben D(P_{S_n}\|\Pol)\leq \sum_{i=1}^n p_i^2 + \Big[\sum_{i=1}^nH(X_i) - H(X_1,X_2,..., X_n)\Big], \een where $D(P_{S_n}\|{Po}(\la))$ is the relative entropy between the distribution of $S_n$ and the Poisson($\la$) distribution. The first term in this bound measures the individual smallness of the $X_i$ and the second term measures their dependence. A general method is outlined for obtaining corresponding bounds when approximating the distribution of a sum of general discrete random variables by an infinitely divisible distribution. Second, in the particular case when the $X_i$ are independent, the following sharper bound is established, \ben D(P_{S_n}\|\Pol)\leq \frac{1}λ \sum_{i=1}^n \frac{p_i^3}{1-p_i}, % \label{eq:abs2} \een and it is also generalized to the case when the $X_i$ are general integer-valued random variables. Its proof is based on the derivation of a subadditivity property for a new discrete version of the Fisher information, and uses a recent logarithmic Sobolev inequality for the Poisson distribution. △ Less

Submitted 17 November, 2004; v1 submitted 1 November, 2002; originally announced November 2002.

Comments: 15 pages. To appear, IEEE Trans Inform Theory

Journal ref: IEEE Transactions on Information Theory, Vol 51/2, 2005, pages 466-472

Showing 1–37 of 37 results for author: Harremoës, P