-
Finite-sample expansions for the optimal error probability in asymmetric binary hypothesis testing
Authors:
Valentinian Lungu,
Ioannis Kontoyiannis
Abstract:
The problem of binary hypothesis testing between two probability measures is considered. New sharp bounds are derived for the best achievable error probability of such tests based on independent and identically distributed observations. Specifically, the asymmetric version of the problem is examined, where different requirements are placed on the two error probabilities. Accurate nonasymptotic exp…
▽ More
The problem of binary hypothesis testing between two probability measures is considered. New sharp bounds are derived for the best achievable error probability of such tests based on independent and identically distributed observations. Specifically, the asymmetric version of the problem is examined, where different requirements are placed on the two error probabilities. Accurate nonasymptotic expansions with explicit constants are obtained for the error probability, using tools from large deviations and Gaussian approximation. Examples are shown indicating that, in the asymmetric regime, the approximations suggested by the new bounds are significantly more accurate than the approximations provided by either of the two main earlier approaches -- normal approximation and error exponents.
△ Less
Submitted 29 May, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Relative entropy bounds for sampling with and without replacement
Authors:
Oliver Johnson,
Lampros Gavalakis,
Ioannis Kontoyiannis
Abstract:
Sharp, nonasymptotic bounds are obtained for the relative entropy between the distributions of sampling with and without replacement from an urn with balls of $c\geq 2$ colors. Our bounds are asymptotically tight in certain regimes and, unlike previous results, they depend on the number of balls of each colour in the urn. The connection of these results with finite de Finetti-style theorems is exp…
▽ More
Sharp, nonasymptotic bounds are obtained for the relative entropy between the distributions of sampling with and without replacement from an urn with balls of $c\geq 2$ colors. Our bounds are asymptotically tight in certain regimes and, unlike previous results, they depend on the number of balls of each colour in the urn. The connection of these results with finite de Finetti-style theorems is explored, and it is observed that a sampling bound due to Stam (1978) combined with the convexity of relative entropy yield a new finite de Finetti bound in relative entropy, which achieves the optimal asymptotic convergence rate.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
The entropic doubling constant and robustness of Gaussian codebooks for additive-noise channels
Authors:
Lampros Gavalakis,
Ioannis Kontoyiannis,
Mokshay Madiman
Abstract:
Entropy comparison inequalities are obtained for the differential entropy $h(X+Y)$ of the sum of two independent random vectors $X,Y$, when one is replaced by a Gaussian. For identically distributed random vectors $X,Y$, these are closely related to bounds on the entropic doubling constant, which quantifies the entropy increase when adding an independent copy of a random vector to itself. Conseque…
▽ More
Entropy comparison inequalities are obtained for the differential entropy $h(X+Y)$ of the sum of two independent random vectors $X,Y$, when one is replaced by a Gaussian. For identically distributed random vectors $X,Y$, these are closely related to bounds on the entropic doubling constant, which quantifies the entropy increase when adding an independent copy of a random vector to itself. Consequences of both large and small doubling are explored. For the former, lower bounds are deduced on the entropy increase when adding an independent Gaussian, while for the latter, a qualitative stability result for the entropy power inequality is obtained. In the more general case of non-identically distributed random vectors $X,Y$, a Gaussian comparison inequality with interesting implications for channel coding is established: For additive-noise channels with a power constraint, Gaussian codebooks come within a $\frac{\sf snr}{3{\sf snr}+2}$ factor of capacity. In the low-SNR regime this improves the half-a-bit additive bound of Zamir and Erez (2004). Analogous results are obtained for additive-noise multiple access channels, and for linear, additive-noise MIMO channels.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Temporally Causal Discovery Tests for Discrete Time Series and Neural Spike Trains
Authors:
A. Theocharous,
G. G. Gregoriou,
P. Sapountzis,
I. Kontoyiannis
Abstract:
We consider the problem of detecting causal relationships between discrete time series, in the presence of potential confounders. A hypothesis test is introduced for identifying the temporally causal influence of $(x_n)$ on $(y_n)$, causally conditioned on a possibly confounding third time series $(z_n)$. Under natural Markovian modeling assumptions, it is shown that the null hypothesis, correspon…
▽ More
We consider the problem of detecting causal relationships between discrete time series, in the presence of potential confounders. A hypothesis test is introduced for identifying the temporally causal influence of $(x_n)$ on $(y_n)$, causally conditioned on a possibly confounding third time series $(z_n)$. Under natural Markovian modeling assumptions, it is shown that the null hypothesis, corresponding to the absence of temporally causal influence, is equivalent to the underlying `causal conditional directed information rate' being equal to zero. The plug-in estimator for this functional is identified with the log-likelihood ratio test statistic for the desired test. This statistic is shown to be asymptotically normal under the alternative hypothesis and asymptotically $χ^2$ distributed under the null, facilitating the computation of $p$-values when used on empirical data. The effectiveness of the resulting hypothesis test is illustrated on simulated data, validating the underlying theory. The test is also employed in the analysis of spike train data recorded from neurons in the V4 and FEF brain regions of behaving animals during a visual attention task. There, the test results are seen to identify interesting and biologically relevant information.
△ Less
Submitted 17 November, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
A Third Information-Theoretic Approach to Finite de Finetti Theorems
Authors:
Mario Berta,
Lampros Gavalakis,
Ioannis Kontoyiannis
Abstract:
A new finite form of de Finetti's representation theorem is established using elementary information-theoretic tools. The distribution of the first $k$ random variables in an exchangeable vector of $n\geq k$ random variables is close to a mixture of product distributions. Closeness is measured in terms of the relative entropy and an explicit bound is provided. This bound is tighter than those obta…
▽ More
A new finite form of de Finetti's representation theorem is established using elementary information-theoretic tools. The distribution of the first $k$ random variables in an exchangeable vector of $n\geq k$ random variables is close to a mixture of product distributions. Closeness is measured in terms of the relative entropy and an explicit bound is provided. This bound is tighter than those obtained via earlier information-theoretic proofs, and its utility extends to random variables taking values in general spaces. The core argument employed has its origins in the quantum information-theoretic literature.
△ Less
Submitted 25 April, 2024; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Truly Bayesian Entropy Estimation
Authors:
Ioannis Papageorgiou,
Ioannis Kontoyiannis
Abstract:
Estimating the entropy rate of discrete time series is a challenging problem with important applications in numerous areas including neuroscience, genomics, image processing and natural language processing. A number of approaches have been developed for this task, typically based either on universal data compression algorithms, or on statistical estimators of the underlying process distribution. I…
▽ More
Estimating the entropy rate of discrete time series is a challenging problem with important applications in numerous areas including neuroscience, genomics, image processing and natural language processing. A number of approaches have been developed for this task, typically based either on universal data compression algorithms, or on statistical estimators of the underlying process distribution. In this work, we propose a fully-Bayesian approach for entropy estimation. Building on the recently introduced Bayesian Context Trees (BCT) framework for modelling discrete time series as variable-memory Markov chains, we show that it is possible to sample directly from the induced posterior on the entropy rate. This can be used to estimate the entire posterior distribution, providing much richer information than point estimates. We develop theoretical results for the posterior distribution of the entropy rate, including proofs of consistency and asymptotic normality. The practical utility of the method is illustrated on both simulated and real-world data, where it is found to outperform state-of-the-art alternatives.
△ Less
Submitted 21 March, 2023; v1 submitted 13 December, 2022;
originally announced December 2022.
-
Context-tree weighting and Bayesian Context Trees: Asymptotic and non-asymptotic justifications
Authors:
Ioannis Kontoyiannis
Abstract:
The Bayesian Context Trees (BCT) framework is a recently introduced, general collection of statistical and algorithmic tools for modelling, analysis and inference with discrete-valued time series. The foundation of this development is built in part on some well-known information-theoretic ideas and techniques, including Rissanen's tree sources and Willems et al.'s context-tree weighting algorithm.…
▽ More
The Bayesian Context Trees (BCT) framework is a recently introduced, general collection of statistical and algorithmic tools for modelling, analysis and inference with discrete-valued time series. The foundation of this development is built in part on some well-known information-theoretic ideas and techniques, including Rissanen's tree sources and Willems et al.'s context-tree weighting algorithm. This paper presents a collection of theoretical results that provide mathematical justifications and further insight into the BCT modelling framework and the associated practical tools. It is shown that the BCT prior predictive likelihood (the probability of a time series of observations averaged over all models and parameters) is both pointwise and minimax optimal, in agreement with the MDL principle and the BIC criterion. The posterior distribution is shown to be asymptotically consistent with probability one (over both models and parameters), and asymptotically Gaussian (over the parameters). And the posterior predictive distribution is also shown to be asymptotically consistent with probability one.
△ Less
Submitted 5 September, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
Information in probability: Another information-theoretic proof of a finite de Finetti theorem
Authors:
Lampros Gavalakis,
Ioannis Kontoyiannis
Abstract:
We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We d…
▽ More
We recall some of the history of the information-theoretic approach to deriving core results in probability theory and indicate parts of the recent resurgence of interest in this area with current progress along several interesting directions. Then we give a new information-theoretic proof of a finite version of de Finetti's classical representation theorem for finite-valued random variables. We derive an upper bound on the relative entropy between the distribution of the first $k$ in a sequence of $n$ exchangeable random variables, and an appropriate mixture over product distributions. The mixing measure is characterised as the law of the empirical measure of the original sequence, and de Finetti's result is recovered as a corollary. The proof is nicely motivated by the Gibbs conditioning principle in connection with statistical mechanics, and it follows along an appealing sequence of steps. The technical estimates required for these steps are obtained via the use of a collection of combinatorial tools known within information theory as `the method of types.'
△ Less
Submitted 26 April, 2022; v1 submitted 11 April, 2022;
originally announced April 2022.
-
Posterior Representations for Bayesian Context Trees: Sampling, Estimation and Convergence
Authors:
Ioannis Papageorgiou,
Ioannis Kontoyiannis
Abstract:
We revisit the Bayesian Context Trees (BCT) modelling framework for discrete time series, which was recently found to be very effective in numerous tasks including model selection, estimation and prediction. A novel representation of the induced posterior distribution on model space is derived in terms of a simple branching process, and several consequences of this are explored in theory and in pr…
▽ More
We revisit the Bayesian Context Trees (BCT) modelling framework for discrete time series, which was recently found to be very effective in numerous tasks including model selection, estimation and prediction. A novel representation of the induced posterior distribution on model space is derived in terms of a simple branching process, and several consequences of this are explored in theory and in practice. First, it is shown that the branching process representation leads to a simple variable-dimensional Monte Carlo sampler for the joint posterior distribution on models and parameters, which can efficiently produce independent samples. This sampler is found to be more efficient than earlier MCMC samplers for the same tasks. Then, the branching process representation is used to establish the asymptotic consistency of the BCT posterior, including the derivation of an almost-sure convergence rate. Finally, an extensive study is carried out on the performance of the induced Bayesian entropy estimator. Its utility is illustrated through both simulation experiments and real-world applications, where it is found to outperform several state-of-the-art methods.
△ Less
Submitted 20 March, 2023; v1 submitted 4 February, 2022;
originally announced February 2022.
-
The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning
Authors:
Vivek Borkar,
Shuhang Chen,
Adithya Devraj,
Ioannis Kontoyiannis,
Sean Meyn
Abstract:
The paper concerns the stochastic approximation recursion, \[ θ_{n+1}= θ_n + α_{n + 1} f(θ_n, Φ_{n+1})
\,,\quad n\ge 0, \] where the {\em estimates} $θ_n\in\Re^d$ and $ \{ Φ_n \}$ is a Markov chain on a general state space. In addition to standard Lipschitz assumptions and conditions on the vanishing step-size sequence, it is assumed that the associated \textit{mean flow}…
▽ More
The paper concerns the stochastic approximation recursion, \[ θ_{n+1}= θ_n + α_{n + 1} f(θ_n, Φ_{n+1})
\,,\quad n\ge 0, \] where the {\em estimates} $θ_n\in\Re^d$ and $ \{ Φ_n \}$ is a Markov chain on a general state space. In addition to standard Lipschitz assumptions and conditions on the vanishing step-size sequence, it is assumed that the associated \textit{mean flow} $ \tfrac{d}{dt} \vartheta_t = \bar{f}(\vartheta_t)$, is globally asymptotically stable with stationary point denoted $θ^*$, where $\bar{f}(θ)=\text{ E}[f(θ,Φ)]$ with $Φ$ having the stationary distribution of the chain. The main results are established under additional conditions on the mean flow and a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3) for the chain:
(i) An appropriate Lyapunov function is constructed that implies convergence of the estimates in $L_4$.
(ii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance $\text{ E} [ z_n z_n^T ]$ to the asymptotic covariance $Σ^Θ$ in the CLT, where $z_n= (θ_n-θ^*)/\sqrt{α_n}$.
(iii) The CLT holds for the normalized version $z^{\text{ PR}}_n$ of the averaged parameters $θ^{\text{ PR}}_n$, subject to standard assumptions on the step-size. Moreover, the normalized covariance of both $θ^{\text{ PR}}_n$ and $z^{\text{ PR}}_n$ converge to $Σ^{\text{ PR}}$, the minimal covariance of Polyak and Ruppert.
(iv)} An example is given where $f$ and $\bar{f}$ are linear in $θ$, and the Markov chain is geometrically ergodic but does not satisfy (DV3). While the algorithm is convergent, the second moment of $θ_n$ is unbounded and in fact diverges.
△ Less
Submitted 21 February, 2024; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Entropy and the Discrete Central Limit Theorem
Authors:
Lampros Gavalakis,
Ioannis Kontoyiannis
Abstract:
A strengthened version of the central limit theorem for discrete random variables is established, relying only on information-theoretic tools and elementary arguments. It is shown that the relative entropy between the standardised sum of $n$ independent and identically distributed lattice random variables and an appropriately discretised Gaussian, vanishes as $n\to\infty$.
A strengthened version of the central limit theorem for discrete random variables is established, relying only on information-theoretic tools and elementary arguments. It is shown that the relative entropy between the standardised sum of $n$ independent and identically distributed lattice random variables and an appropriately discretised Gaussian, vanishes as $n\to\infty$.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
The Feature-First Block Model
Authors:
Lawrence Tray,
Ioannis Kontoyiannis
Abstract:
Labelled networks are an important class of data, naturally appearing in numerous applications in science and engineering. A typical inference goal is to determine how the vertex labels (or features) affect the network's structure. In this work, we introduce a new generative model, the feature-first block model (FFBM), that facilitates the use of rich queries on labelled networks. We develop a Bay…
▽ More
Labelled networks are an important class of data, naturally appearing in numerous applications in science and engineering. A typical inference goal is to determine how the vertex labels (or features) affect the network's structure. In this work, we introduce a new generative model, the feature-first block model (FFBM), that facilitates the use of rich queries on labelled networks. We develop a Bayesian framework and devise a two-level Markov chain Monte Carlo approach to efficiently sample from the relevant posterior distribution of the FFBM parameters. This allows us to infer if and how the observed vertex-features affect macro-structure. We apply the proposed methods to a variety of network data to extract the most important features along which the vertices are partitioned. The main advantages of the proposed approach are that the whole feature-space is used automatically and that features can be rank-ordered implicitly according to impact.
△ Less
Submitted 16 November, 2021; v1 submitted 28 May, 2021;
originally announced May 2021.
-
An Information-Theoretic Proof of a Finite de Finetti Theorem
Authors:
Lampros Gavalakis,
Ioannis Kontoyiannis
Abstract:
A finite form of de Finetti's representation theorem is established using elementary information-theoretic tools: The distribution of the first $k$ random variables in an exchangeable binary vector of length $n\geq k$ is close to a mixture of product distributions. Closeness is measured in terms of the relative entropy and an explicit bound is provided.
A finite form of de Finetti's representation theorem is established using elementary information-theoretic tools: The distribution of the first $k$ random variables in an exchangeable binary vector of length $n\geq k$ is close to a mixture of product distributions. Closeness is measured in terms of the relative entropy and an explicit bound is provided.
△ Less
Submitted 25 June, 2021; v1 submitted 8 April, 2021;
originally announced April 2021.
-
Compression and Symmetry of Small-World Graphs and Structures
Authors:
Ioannis Kontoyiannis,
Yi Heng Lim,
Katia Papakonstantinopoulou,
Wojtek Szpankowski
Abstract:
For various purposes and, in particular, in the context of data compression, a graph can be examined at three levels. Its structure can be described as the unlabeled version of the graph; then the labeling of its structure can be added; and finally, given then structure and labeling, the contents of the labels can be described. Determining the amount of information present at each level and quanti…
▽ More
For various purposes and, in particular, in the context of data compression, a graph can be examined at three levels. Its structure can be described as the unlabeled version of the graph; then the labeling of its structure can be added; and finally, given then structure and labeling, the contents of the labels can be described. Determining the amount of information present at each level and quantifying the degree of dependence between them, requires the study of symmetry, graph automorphism, entropy, and graph compressibility. In this paper, we focus on a class of small-world graphs. These are geometric random graphs where vertices are first connected to their nearest neighbors on a circle and then pairs of non-neighbors are connected according to a distance-dependent probability distribution. We establish the degree distribution of this model, and use it to prove the model's asymmetry in an appropriate range of parameters. Then we derive the relevant entropy and structural entropy of these random graphs, in connection with graph compression.
△ Less
Submitted 22 November, 2021; v1 submitted 31 July, 2020;
originally announced July 2020.
-
Bayesian Context Trees: Modelling and exact inference for discrete time series
Authors:
Ioannis Kontoyiannis,
Lambros Mertzanis,
Athina Panotopoulou,
Ioannis Papageorgiou,
Maria Skoularidou
Abstract:
We develop a new Bayesian modelling framework for the class of higher-order, variable-memory Markov chains, and introduce an associated collection of methodological tools for exact inference with discrete time series. We show that a version of the context tree weighting algorithm can compute the prior predictive likelihood exactly (averaged over both models and parameters), and two related algorit…
▽ More
We develop a new Bayesian modelling framework for the class of higher-order, variable-memory Markov chains, and introduce an associated collection of methodological tools for exact inference with discrete time series. We show that a version of the context tree weighting algorithm can compute the prior predictive likelihood exactly (averaged over both models and parameters), and two related algorithms are introduced, which identify the a posteriori most likely models and compute their exact posterior probabilities. All three algorithms are deterministic and have linear-time complexity. A family of variable-dimension Markov chain Monte Carlo samplers is also provided, facilitating further exploration of the posterior. The performance of the proposed methods in model selection, Markov order estimation and prediction is illustrated through simulation experiments and real-world applications with data from finance, genetics, neuroscience, and animal communication. The associated algorithms are implemented in the R package BCT.
△ Less
Submitted 6 February, 2022; v1 submitted 29 July, 2020;
originally announced July 2020.
-
Sharp Second-Order Pointwise Asymptotics for Lossless Compression with Side Information
Authors:
Lampros Gavalakis,
Ioannis Kontoyiannis
Abstract:
The problem of determining the best achievable performance of arbitrary lossless compression algorithms is examined, when correlated side information is available at both the encoder and decoder. For arbitrary source-side information pairs, the conditional information density is shown to provide a sharp asymptotic lower bound for the description lengths achieved by an arbitrary sequence of compres…
▽ More
The problem of determining the best achievable performance of arbitrary lossless compression algorithms is examined, when correlated side information is available at both the encoder and decoder. For arbitrary source-side information pairs, the conditional information density is shown to provide a sharp asymptotic lower bound for the description lengths achieved by an arbitrary sequence of compressors. This implies that, for ergodic source-side information pairs, the conditional entropy rate is the best achievable asymptotic lower bound to the rate, not just in expectation but with probability one. Under appropriate mixing conditions, a central limit theorem and a law of the iterated logarithm are proved, describing the inevitable fluctuations of the second-order asymptotically best possible rate. An idealised version of Lempel-Ziv coding with side information is shown to be universally first- and second-order asymptotically optimal, under the same conditions. These results are in part based on a new almost-sure invariance principle for the conditional information density, which may be of independent interest.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.
-
The Lévy State Space Model
Authors:
Simon Godsill,
Marina Riabiz,
Ioannis Kontoyiannis
Abstract:
In this paper we introduce a new class of state space models based on shot-noise simulation representations of non-Gaussian Lévy-driven linear systems, represented as stochastic differential equations. In particular a conditionally Gaussian version of the models is proposed that is able to capture heavy-tailed non-Gaussianity while retaining tractability for inference procedures. We focus on a can…
▽ More
In this paper we introduce a new class of state space models based on shot-noise simulation representations of non-Gaussian Lévy-driven linear systems, represented as stochastic differential equations. In particular a conditionally Gaussian version of the models is proposed that is able to capture heavy-tailed non-Gaussianity while retaining tractability for inference procedures. We focus on a canonical class of such processes, the $α$-stable Lévy processes, which retain important properties such as self-similarity and heavy-tails, while emphasizing that broader classes of non-Gaussian Lévy processes may be handled by similar methodology. An important feature is that we are able to marginalise both the skewness and the scale parameters of these challenging models from posterior probability distributions. The models are posed in continuous time and so are able to deal with irregular data arrival times. Example modelling and inference procedures are provided using Rao-Blackwellised sequential Monte Carlo applied to a two-dimensional Langevin model, and this is tested on real exchange rate data.
△ Less
Submitted 8 January, 2020; v1 submitted 28 December, 2019;
originally announced December 2019.
-
Fundamental Limits of Lossless Data Compression with Side Information
Authors:
Lampros Gavalakis,
Ioannis Kontoyiannis
Abstract:
The problem of lossless data compression with side information available to both the encoder and the decoder is considered. The finite-blocklength fundamental limits of the best achievable performance are defined, in two different versions of the problem: Reference-based compression, when a single side information string is used repeatedly in compressing different source messages, and pair-based c…
▽ More
The problem of lossless data compression with side information available to both the encoder and the decoder is considered. The finite-blocklength fundamental limits of the best achievable performance are defined, in two different versions of the problem: Reference-based compression, when a single side information string is used repeatedly in compressing different source messages, and pair-based compression, where a different side information string is used for each source message. General achievability and converse theorems are established for arbitrary source-side information pairs. Nonasymptotic normal approximation expansions are proved for the optimal rate in both the reference-based and pair-based settings, for memoryless sources. These are stated in terms of explicit, finite-blocklength bounds, that are tight up to third-order terms. Extensions that go significantly beyond the class of memoryless sources are obtained. The relevant source dispersion is identified and its relationship with the conditional varentropy rate is established. Interestingly, the dispersion is different in reference-based and pair-based compression, and it is proved that the reference-based dispersion is in general smaller.
△ Less
Submitted 21 February, 2021; v1 submitted 11 December, 2019;
originally announced December 2019.
-
Differential Temporal Difference Learning
Authors:
Adithya M. Devraj,
Ioannis Kontoyiannis,
Sean P. Meyn
Abstract:
Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to the associated Bellman equations is challenging in most practical cases of interest. A popular class of approximation techniques, known as Temporal Difference (…
▽ More
Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to the associated Bellman equations is challenging in most practical cases of interest. A popular class of approximation techniques, known as Temporal Difference (TD) learning algorithms, are an important sub-class of general reinforcement learning methods. The algorithms introduced in this paper are intended to resolve two well-known difficulties of TD-learning approaches: Their slow convergence due to very high variance, and the fact that, for the problem of computing the relative value function, consistent algorithms exist only in special cases. First we show that the gradients of these value functions admit a representation that lends itself to algorithm design. Based on this result, a new class of differential TD-learning algorithms is introduced. For Markovian models on Euclidean space with smooth dynamics, the algorithms are shown to be consistent under general conditions. Numerical results show dramatic variance reduction when compared to standard methods.
△ Less
Submitted 27 February, 2020; v1 submitted 28 December, 2018;
originally announced December 2018.
-
A Simple Network of Nodes Moving on the Circle
Authors:
Dimitris Cheliotis,
Ioannis Kontoyiannis,
Michail Loulakis,
Stavros Toumpis
Abstract:
Two simple Markov processes are examined, one in discrete and one in continuous time, arising from idealized versions of a transmission protocol for mobile, delay-tolerant networks. We consider two independent walkers moving with constant speed on either the discrete or continuous circle, and changing directions at independent geometric (respectively, exponential) times. One of the walkers carries…
▽ More
Two simple Markov processes are examined, one in discrete and one in continuous time, arising from idealized versions of a transmission protocol for mobile, delay-tolerant networks. We consider two independent walkers moving with constant speed on either the discrete or continuous circle, and changing directions at independent geometric (respectively, exponential) times. One of the walkers carries a message that wishes to travel as far and as fast as possible in the clockwise direction. The message stays with its current carrier unless the two walkers meet, the carrier is moving counter-clockwise, and the other walker is moving clockwise. In that case, the message jumps to the other walker. The long-term average clockwise speed of the message is computed. An explicit expression is derived via the solution of an associated boundary value problem in terms of the generator of the underlying Markov process. The average transmission cost is also similarly computed, measured as the long-term number of jumps the message makes per unit time. The tradeoff between speed and cost is examined, as a function of the underlying problem parameters.
△ Less
Submitted 4 March, 2020; v1 submitted 11 August, 2018;
originally announced August 2018.
-
Nonasymptotic Gaussian Approximation for Inference with Stable Noise
Authors:
Marina Riabiz,
Tohid Ardeshiri,
Ioannis Kontoyiannis,
Simon Godsill
Abstract:
The results of a series of theoretical studies are reported, examining the convergence rate for different approximate representations of $α$-stable distributions. Although they play a key role in modelling random processes with jumps and discontinuities, the use of $α$-stable distributions in inference often leads to analytically intractable problems. The LePage series, which is a probabilistic re…
▽ More
The results of a series of theoretical studies are reported, examining the convergence rate for different approximate representations of $α$-stable distributions. Although they play a key role in modelling random processes with jumps and discontinuities, the use of $α$-stable distributions in inference often leads to analytically intractable problems. The LePage series, which is a probabilistic representation employed in this work, is used to transform an intractable, infinite-dimensional inference problem into a conditionally Gaussian parametric problem. A major component of our approach is the approximation of the tail of this series by a Gaussian random variable. Standard statistical techniques, such as Expectation-Maximization, Markov chain Monte Carlo, and Particle Filtering, can then be applied. In addition to the asymptotic normality of the tail of this series, we establish explicit, nonasymptotic bounds on the approximation error. Their proofs follow classical Fourier-analytic arguments, using Esséen's smoothing lemma. Specifically, we consider the distance between the distributions of: $(i)$~the tail of the series and an appropriate Gaussian; $(ii)$~the full series and the truncated series; and $(iii)$~the full series and the truncated series with an added Gaussian term. In all three cases, sharp bounds are established, and the theoretical results are compared with the actual distances (computed numerically) in specific examples of symmetric $α$-stable distributions. This analysis facilitates the selection of appropriate truncations in practice and offers theoretical guarantees for the accuracy of resulting estimates. One of the main conclusions obtained is that, for the purposes of inference, the use of a truncated series together with an approximately Gaussian error term has superior statistical properties and is likely a preferable choice in practice.
△ Less
Submitted 1 January, 2020; v1 submitted 27 February, 2018;
originally announced February 2018.
-
Packet Speed and Cost in Mobile Wireless Delay-Tolerant Networks
Authors:
Riccardo Cavallari,
Stavros Toumpis,
Roberto Verdone,
Ioannis Kontoyiannis
Abstract:
A mobile wireless delay-tolerant network (DTN) model is proposed and analyzed, in which infinitely many nodes are initially placed on R^2 according to a uniform Poisson point process (PPP) and subsequently travel, independently of each other, along trajectories comprised of line segments, changing travel direction at time instances that form a Poisson process, each time selecting a new travel dire…
▽ More
A mobile wireless delay-tolerant network (DTN) model is proposed and analyzed, in which infinitely many nodes are initially placed on R^2 according to a uniform Poisson point process (PPP) and subsequently travel, independently of each other, along trajectories comprised of line segments, changing travel direction at time instances that form a Poisson process, each time selecting a new travel direction from an arbitrary distribution; all nodes maintain constant speed. A single information packet is traveling towards a given direction using both wireless transmissions and sojourns on node buffers, according to a member of a broad class of possible routing rules. For this model, we compute the long-term averages of the speed with which the packet travels towards its destination and the rate with which the wireless transmission cost accumulates. Because of the complexity of the problem, we employ two intuitive, simplifying approximations; simulations verify that the approximation error is typically small. Our results quantify the fundamental trade-off that exists in mobile wireless DTNs between the packet speed and the packet delivery cost. The framework developed here is both general and versatile, and can be used as a starting point for further investigation.
△ Less
Submitted 28 February, 2018; v1 submitted 7 January, 2018;
originally announced January 2018.
-
Entropy bounds on abelian groups and the Ruzsa divergence
Authors:
Mokshay Madiman,
Ioannis Kontoyiannis
Abstract:
Over the past few years, a family of interesting new inequalities for the entropies of sums and differences of random variables has been developed by Ruzsa, Tao and others, motivated by analogous results in additive combinatorics. The present work extends these earlier results to the case of random variables taking values in $\mathbb{R}^n$ or, more generally, in arbitrary locally compact and Polis…
▽ More
Over the past few years, a family of interesting new inequalities for the entropies of sums and differences of random variables has been developed by Ruzsa, Tao and others, motivated by analogous results in additive combinatorics. The present work extends these earlier results to the case of random variables taking values in $\mathbb{R}^n$ or, more generally, in arbitrary locally compact and Polish abelian groups. We isolate and study a key quantity, the Ruzsa divergence between two probability distributions, and we show that its properties can be used to extend the earlier inequalities to the present general setting. The new results established include several variations on the theme that the entropies of the sum and the difference of two independent random variables severely constrain each other. Although the setting is quite general, the result are already of interest (and new) for random vectors in $\mathbb{R}^n$. In that special case, quantitative bounds are provided for the stability of the equality conditions in the entropy power inequality; a reverse entropy power inequality for log-concave random vectors is proved; an information-theoretic analog of the Rogers-Shephard inequality for convex bodies is established; and it is observed that some of these results lead to new inequalities for the determinants of positive-definite matrices. Moreover, by considering the multiplicative subgroups of the complex plane, one obtains new inequalities for the differential entropies of products and ratios of nonzero, complex-valued random variables.
△ Less
Submitted 26 October, 2015; v1 submitted 17 August, 2015;
originally announced August 2015.
-
Estimating the Directed Information and Testing for Causality
Authors:
Ioannis Kontoyiannis,
Maria Skoularidou
Abstract:
The problem of estimating the directed information rate between two discrete processes $\{X_n\}$ and $\{Y_n\}$ via the plug-in (or maximum-likelihood) estimator is considered. When the joint process $\{(X_n,Y_n)\}$ is a Markov chain of a given memory length, the plug-in estimator is shown to be asymptotically Gaussian and to converge at the optimal rate $O(1/\sqrt{n})$ under appropriate conditions…
▽ More
The problem of estimating the directed information rate between two discrete processes $\{X_n\}$ and $\{Y_n\}$ via the plug-in (or maximum-likelihood) estimator is considered. When the joint process $\{(X_n,Y_n)\}$ is a Markov chain of a given memory length, the plug-in estimator is shown to be asymptotically Gaussian and to converge at the optimal rate $O(1/\sqrt{n})$ under appropriate conditions; this is the first estimator that has been shown to achieve this rate. An important connection is drawn between the problem of estimating the directed information rate and that of performing a hypothesis test for the presence of causal influence between the two processes. Under fairly general conditions, the null hypothesis, which corresponds to the absence of causal influence, is equivalent to the requirement that the directed information rate be equal to zero. In that case a finer result is established, showing that the plug-in converges at the faster rate $O(1/n)$ and that it is asymptotically $χ^2$-distributed. This is proved by showing that this estimator is equal to (a scalar multiple of) the classical likelihood ratio statistic for the above hypothesis test. Finally it is noted that these results facilitate the design of an actual likelihood ratio test for the presence or absence of causal influence.
△ Less
Submitted 31 March, 2016; v1 submitted 5 July, 2015;
originally announced July 2015.
-
Lossless Data Compression at Finite Blocklengths
Authors:
Ioannis Kontoyiannis,
Sergio Verdu
Abstract:
This paper provides an extensive study of the behavior of the best achievable rate (and other related fundamental limits) in variable-length lossless compression. In the non-asymptotic regime, the fundamental limits of fixed-to-variable lossless compression with and without prefix constraints are shown to be tightly coupled. Several precise, quantitative bounds are derived, connecting the distribu…
▽ More
This paper provides an extensive study of the behavior of the best achievable rate (and other related fundamental limits) in variable-length lossless compression. In the non-asymptotic regime, the fundamental limits of fixed-to-variable lossless compression with and without prefix constraints are shown to be tightly coupled. Several precise, quantitative bounds are derived, connecting the distribution of the optimal codelengths to the source information spectrum, and an exact analysis of the best achievable rate for arbitrary sources is given.
Fine asymptotic results are proved for arbitrary (not necessarily prefix) compressors on general mixing sources. Non-asymptotic, explicit Gaussian approximation bounds are established for the best achievable rate on Markov sources. The source dispersion and the source varentropy rate are defined and characterized. Together with the entropy rate, the varentropy rate serves to tightly approximate the fundamental non-asymptotic limits of fixed-to-variable compression for all but very small blocklengths.
△ Less
Submitted 11 December, 2012;
originally announced December 2012.
-
Sumset and Inverse Sumset Inequalities for Differential Entropy and Mutual Information
Authors:
Ioannis Kontoyiannis,
Mokshay Madiman
Abstract:
The sumset and inverse sumset theories of Freiman, Plünnecke and Ruzsa, give bounds connecting the cardinality of the sumset $A+B=\{a+b\;;\;a\in A,\,b\in B\}$ of two discrete sets $A,B$, to the cardinalities (or the finer structure) of the original sets $A,B$. For example, the sum-difference bound of Ruzsa states that, $|A+B|\,|A|\,|B|\leq|A-B|^3$, where the difference set…
▽ More
The sumset and inverse sumset theories of Freiman, Plünnecke and Ruzsa, give bounds connecting the cardinality of the sumset $A+B=\{a+b\;;\;a\in A,\,b\in B\}$ of two discrete sets $A,B$, to the cardinalities (or the finer structure) of the original sets $A,B$. For example, the sum-difference bound of Ruzsa states that, $|A+B|\,|A|\,|B|\leq|A-B|^3$, where the difference set $A-B= \{a-b\;;\;a\in A,\,b\in B\}$. Interpreting the differential entropy $h(X)$ of a continuous random variable $X$ as (the logarithm of) the size of the effective support of $X$, the main contribution of this paper is a series of natural information-theoretic analogs for these results. For example, the Ruzsa sum-difference bound becomes the new inequality, $h(X+Y)+h(X)+h(Y)\leq 3h(X-Y)$, for any pair of independent continuous random variables $X$ and $Y$. Our results include differential-entropy versions of Ruzsa's triangle inequality, the Plünnecke-Ruzsa inequality, and the Balog-Szemerédi-Gowers lemma. Also we give a differential entropy version of the Freiman-Green-Ruzsa inverse-sumset theorem, which can be seen as a quantitative converse to the entropy power inequality. Versions of most of these results for the discrete entropy $H(X)$ were recently proved by Tao, relying heavily on a strong, functional form of the submodularity property of $H(X)$. Since differential entropy is {\em not} functionally submodular, in the continuous case many of the corresponding discrete proofs fail, in many cases requiring substantially new proof strategies. We find that the basic property that naturally replaces the discrete functional submodularity, is the data processing property of mutual information.
△ Less
Submitted 3 June, 2012;
originally announced June 2012.
-
Compound Poisson Approximation via Information Functionals
Authors:
A. D. Barbour,
Oliver Johnson,
Ioannis Kontoyiannis,
Mokshay Madiman
Abstract:
An information-theoretic development is given for the problem of compound Poisson approximation, which parallels earlier treatments for Gaussian and Poisson approximation. Let $P_{S_n}$ be the distribution of a sum $S_n=\Sumn Y_i$ of independent integer-valued random variables $Y_i$. Nonasymptotic bounds are derived for the distance between $P_{S_n}$ and an appropriately chosen compound Poisson la…
▽ More
An information-theoretic development is given for the problem of compound Poisson approximation, which parallels earlier treatments for Gaussian and Poisson approximation. Let $P_{S_n}$ be the distribution of a sum $S_n=\Sumn Y_i$ of independent integer-valued random variables $Y_i$. Nonasymptotic bounds are derived for the distance between $P_{S_n}$ and an appropriately chosen compound Poisson law. In the case where all $Y_i$ have the same conditional distribution given $\{Y_i\neq 0\}$, a bound on the relative entropy distance between $P_{S_n}$ and the compound Poisson distribution is derived, based on the data-processing property of relative entropy and earlier Poisson approximation results. When the $Y_i$ have arbitrary distributions, corresponding bounds are derived in terms of the total variation distance. The main technical ingredient is the introduction of two "information functionals," and the analysis of their properties. These information functionals play a role analogous to that of the classical Fisher information in normal approximation. Detailed comparisons are made between the resulting inequalities and related bounds.
△ Less
Submitted 21 April, 2010;
originally announced April 2010.
-
Log-concavity, ultra-log-concavity, and a maximum entropy property of discrete compound Poisson measures
Authors:
Oliver Johnson,
Ioannis Kontoyiannis,
Mokshay Madiman
Abstract:
Sufficient conditions are developed, under which the compound Poisson distribution has maximal entropy within a natural class of probability measures on the nonnegative integers. Recently, one of the authors [O. Johnson, {\em Stoch. Proc. Appl.}, 2007] used a semigroup approach to show that the Poisson has maximal entropy among all ultra-log-concave distributions with fixed mean. We show via a non…
▽ More
Sufficient conditions are developed, under which the compound Poisson distribution has maximal entropy within a natural class of probability measures on the nonnegative integers. Recently, one of the authors [O. Johnson, {\em Stoch. Proc. Appl.}, 2007] used a semigroup approach to show that the Poisson has maximal entropy among all ultra-log-concave distributions with fixed mean. We show via a non-trivial extension of this semigroup approach that the natural analog of the Poisson maximum entropy property remains valid if the compound Poisson distributions under consideration are log-concave, but that it fails in general. A parallel maximum entropy result is established for the family of compound binomial measures. Sufficient conditions for compound distributions to be log-concave are discussed and applications to combinatorics are examined; new bounds are derived on the entropy of the cardinality of a random independent set in a claw-free graph, and a connection is drawn to Mason's conjecture for matroids. The present results are primarily motivated by the desire to provide an information-theoretic foundation for compound Poisson approximation and associated limit theorems, analogous to the corresponding developments for the central limit theorem and for Poisson approximation. Our results also demonstrate new links between some probabilistic methods and the combinatorial notions of log-concavity and ultra-log-concavity, and they add to the growing body of work exploring the applications of maximum entropy characterizations to problems in discrete mathematics.
△ Less
Submitted 27 September, 2011; v1 submitted 3 December, 2009;
originally announced December 2009.
-
Thinning, Entropy and the Law of Thin Numbers
Authors:
Peter Harremoes,
Oliver Johnson,
Ioannis Kontoyiannis
Abstract:
Renyi's "thinning" operation on a discrete random variable is a natural discrete analog of the scaling operation for continuous random variables. The properties of thinning are investigated in an information-theoretic context, especially in connection with information-theoretic inequalities related to Poisson approximation results. The classical Binomial-to-Poisson convergence (sometimes referre…
▽ More
Renyi's "thinning" operation on a discrete random variable is a natural discrete analog of the scaling operation for continuous random variables. The properties of thinning are investigated in an information-theoretic context, especially in connection with information-theoretic inequalities related to Poisson approximation results. The classical Binomial-to-Poisson convergence (sometimes referred to as the "law of small numbers" is seen to be a special case of a thinning limit theorem for convolutions of discrete distributions. A rate of convergence is provided for this limit, and nonasymptotic bounds are also established. This development parallels, in part, the development of Gaussian inequalities leading to the information-theoretic version of the central limit theorem. In particular, a "thinning Markov chain" is introduced, and it is shown to play a role analogous to that of the Ornstein-Uhlenbeck process in connection to the entropy power inequality.
△ Less
Submitted 3 June, 2009;
originally announced June 2009.
-
Lossy Compression in Near-Linear Time via Efficient Random Codebooks and Databases
Authors:
Chris Gioran,
Ioannis Kontoyiannis
Abstract:
The compression-complexity trade-off of lossy compression algorithms that are based on a random codebook or a random database is examined. Motivated, in part, by recent results of Gupta-Verdú-Weissman (GVW) and their underlying connections with the pattern-matching scheme of Kontoyiannis' lossy Lempel-Ziv algorithm, we introduce a non-universal version of the lossy Lempel-Ziv method (termed LLZ)…
▽ More
The compression-complexity trade-off of lossy compression algorithms that are based on a random codebook or a random database is examined. Motivated, in part, by recent results of Gupta-Verdú-Weissman (GVW) and their underlying connections with the pattern-matching scheme of Kontoyiannis' lossy Lempel-Ziv algorithm, we introduce a non-universal version of the lossy Lempel-Ziv method (termed LLZ). The optimality of LLZ for memoryless sources is established, and its performance is compared to that of the GVW divide-and-conquer approach. Experimental results indicate that the GVW approach often yields better compression than LLZ, but at the price of much higher memory requirements. To combine the advantages of both, we introduce a hybrid algorithm (HYB) that utilizes both the divide-and-conquer idea of GVW and the single-database structure of LLZ. It is proved that HYB shares with GVW the exact same rate-distortion performance and implementation complexity, while, like LLZ, requiring less memory, by a factor which may become unbounded, depending on the choice or the relevant design parameters. Experimental results are also presented, illustrating the performance of all three methods on data generated by simple discrete memoryless sources. In particular, the HYB algorithm is shown to outperform existing schemes for the compression of some simple discrete sources with respect to the Hamming distortion criterion.
△ Less
Submitted 21 April, 2009;
originally announced April 2009.
-
On the entropy and log-concavity of compound Poisson measures
Authors:
Oliver Johnson,
Ioannis Kontoyiannis,
Mokshay Madiman
Abstract:
Motivated, in part, by the desire to develop an information-theoretic foundation for compound Poisson approximation limit theorems (analogous to the corresponding developments for the central limit theorem and for simple Poisson approximation), this work examines sufficient conditions under which the compound Poisson distribution has maximal entropy within a natural class of probability measures…
▽ More
Motivated, in part, by the desire to develop an information-theoretic foundation for compound Poisson approximation limit theorems (analogous to the corresponding developments for the central limit theorem and for simple Poisson approximation), this work examines sufficient conditions under which the compound Poisson distribution has maximal entropy within a natural class of probability measures on the nonnegative integers. We show that the natural analog of the Poisson maximum entropy property remains valid if the measures under consideration are log-concave, but that it fails in general. A parallel maximum entropy result is established for the family of compound binomial measures. The proofs are largely based on ideas related to the semigroup approach introduced in recent work by Johnson for the Poisson family. Sufficient conditions are given for compound distributions to be log-concave, and specific examples are presented illustrating all the above results.
△ Less
Submitted 27 May, 2008;
originally announced May 2008.
-
Estimating the entropy of binary time series: Methodology, some theory and a simulation study
Authors:
Y. Gao,
I. Kontoyiannis,
E. Bienenstock
Abstract:
Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and extensive comparison between some of the most popular and effective entropy estimation methods used in practice: The plug-in method, four different estimators based on the Lempel-Ziv (LZ) family of data compression algorithms, an estimator based on the Context-Tree Weighting (CTW) method, and the renewal e…
▽ More
Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and extensive comparison between some of the most popular and effective entropy estimation methods used in practice: The plug-in method, four different estimators based on the Lempel-Ziv (LZ) family of data compression algorithms, an estimator based on the Context-Tree Weighting (CTW) method, and the renewal entropy estimator.
**Methodology. Three new entropy estimators are introduced. For two of the four LZ-based estimators, a bootstrap procedure is described for evaluating their standard error, and a practical rule of thumb is heuristically derived for selecting the values of their parameters. ** Theory. We prove that, unlike their earlier versions, the two new LZ-based estimators are consistent for every finite-valued, stationary and ergodic process. An effective method is derived for the accurate approximation of the entropy rate of a finite-state HMM with known distribution. Heuristic calculations are presented and approximate formulas are derived for evaluating the bias and the standard error of each estimator. ** Simulation. All estimators are applied to a wide range of data generated by numerous different processes with varying degrees of dependence and memory. Some conclusions drawn from these experiments include: (i) For all estimators considered, the main source of error is the bias. (ii) The CTW method is repeatedly and consistently seen to provide the most accurate results. (iii) The performance of the LZ-based estimators is often comparable to that of the plug-in method. (iv) The main drawback of the plug-in method is its computational inefficiency.
△ Less
Submitted 29 February, 2008;
originally announced February 2008.
-
Identifying statistical dependence in genomic sequences via mutual information estimates
Authors:
H. M. Aktulga,
I. Kontoyiannis,
L. A. Lyznik,
L. Szpankowski,
A. Y. Grama,
W. Szpankowski
Abstract:
Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a preci…
▽ More
Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural dependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the 5' untranslated region in zmSRp32 and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from the FBI's Combined DNA Index System (CODIS), we demonstrate that our approach is particularly well suited for the problem of discovering short tandem repeats, an application of importance in genetic profiling.
△ Less
Submitted 26 October, 2007;
originally announced October 2007.
-
From the entropy to the statistical structure of spike trains
Authors:
Yun Gao,
Ioannis Kontoyiannis,
Elie Bienenstock
Abstract:
We use statistical estimates of the entropy rate of spike train data in order to make inferences about the underlying structure of the spike train itself. We first examine a number of different parametric and nonparametric estimators (some known and some new), including the ``plug-in'' method, several versions of Lempel-Ziv-based compression algorithms, a maximum likelihood estimator tailored to…
▽ More
We use statistical estimates of the entropy rate of spike train data in order to make inferences about the underlying structure of the spike train itself. We first examine a number of different parametric and nonparametric estimators (some known and some new), including the ``plug-in'' method, several versions of Lempel-Ziv-based compression algorithms, a maximum likelihood estimator tailored to renewal processes, and the natural estimator derived from the Context-Tree Weighting method (CTW). The theoretical properties of these estimators are examined, several new theoretical results are developed, and all estimators are systematically applied to various types of synthetic data and under different conditions.
Our main focus is on the performance of these entropy estimators on the (binary) spike trains of 28 neurons recorded simultaneously for a one-hour period from the primary motor and dorsal premotor cortices of a monkey. We show how the entropy estimates can be used to test for the existence of long-term structure in the data, and we construct a hypothesis test for whether the renewal process model is appropriate for these spike trains. Further, by applying the CTW algorithm we derive the maximum a posterior (MAP) tree model of our empirical data, and comment on the underlying structure it reveals.
△ Less
Submitted 27 March, 2008; v1 submitted 22 October, 2007;
originally announced October 2007.
-
Some information-theoretic computations related to the distribution of prime numbers
Authors:
Ioannis Kontoyiannis
Abstract:
We illustrate how elementary information-theoretic ideas may be employed to provide proofs for well-known, nontrivial results in number theory. Specifically, we give an elementary and fairly short proof of the following asymptotic result: The sum of (log p)/p, taken over all primes p not exceeding n, is asymptotic to log n as n tends to infinity. We also give finite-n bounds refining the above l…
▽ More
We illustrate how elementary information-theoretic ideas may be employed to provide proofs for well-known, nontrivial results in number theory. Specifically, we give an elementary and fairly short proof of the following asymptotic result: The sum of (log p)/p, taken over all primes p not exceeding n, is asymptotic to log n as n tends to infinity. We also give finite-n bounds refining the above limit. This result, originally proved by Chebyshev in 1852, is closely related to the celebrated prime number theorem.
△ Less
Submitted 5 November, 2007; v1 submitted 22 October, 2007;
originally announced October 2007.
-
Estimation of the Rate-Distortion Function
Authors:
M. T. Harrison,
I. Kontoyiannis
Abstract:
Motivated by questions in lossy data compression and by theoretical considerations, we examine the problem of estimating the rate-distortion function of an unknown (not necessarily discrete-valued) source from empirical data. Our focus is the behavior of the so-called "plug-in" estimator, which is simply the rate-distortion function of the empirical distribution of the observed data. Sufficient…
▽ More
Motivated by questions in lossy data compression and by theoretical considerations, we examine the problem of estimating the rate-distortion function of an unknown (not necessarily discrete-valued) source from empirical data. Our focus is the behavior of the so-called "plug-in" estimator, which is simply the rate-distortion function of the empirical distribution of the observed data. Sufficient conditions are given for its consistency, and examples are provided to demonstrate that in certain cases it fails to converge to the true rate-distortion function. The analysis of its performance is complicated by the fact that the rate-distortion function is not continuous in the source distribution; the underlying mathematical problem is closely related to the classical problem of establishing the consistency of maximum likelihood estimators. General consistency results are given for the plug-in estimator applied to a broad class of sources, including all stationary and ergodic ones. A more general class of estimation problems is also considered, arising in the context of lossy data compression when the allowed class of coding distributions is restricted; analogous results are developed for the plug-in estimator in that case. Finally, consistency theorems are formulated for modified (e.g., penalized) versions of the plug-in, and for estimating the optimal reproduction distribution.
△ Less
Submitted 11 April, 2008; v1 submitted 2 February, 2007;
originally announced February 2007.
-
Mismatched codebooks and the role of entropy-coding in lossy data compression
Authors:
Ioannis Kontoyiannis,
Rami Zamir
Abstract:
We introduce a universal quantization scheme based on random coding, and we analyze its performance. This scheme consists of a source-independent random codebook (typically_mismatched_ to the source distribution), followed by optimal entropy-coding that is_matched_ to the quantized codeword distribution. A single-letter formula is derived for the rate achieved by this scheme at a given distortio…
▽ More
We introduce a universal quantization scheme based on random coding, and we analyze its performance. This scheme consists of a source-independent random codebook (typically_mismatched_ to the source distribution), followed by optimal entropy-coding that is_matched_ to the quantized codeword distribution. A single-letter formula is derived for the rate achieved by this scheme at a given distortion, in the limit of large codebook dimension. The rate reduction due to entropy-coding is quantified, and it is shown that it can be arbitrarily large. In the special case of "almost uniform" codebooks (e.g., an i.i.d. Gaussian codebook with large variance) and difference distortion measures, a novel connection is drawn between the compression achieved by the present scheme and the performance of "universal" entropy-coded dithered lattice quantizers. This connection generalizes the "half-a-bit" bound on the redundancy of dithered lattice quantizers. Moreover, it demonstrates a strong notion of universality where a single "almost uniform" codebook is near-optimal for_any_ source and_any_ difference distortion measure.
△ Less
Submitted 2 November, 2005;
originally announced November 2005.
-
Source Coding, Large Deviations, and Approximate Pattern Matching
Authors:
A. Dembo,
I. Kontoyiannis
Abstract:
We present a development of parts of rate-distortion theory and pattern- matching algorithms for lossy data compression, centered around a lossy version of the Asymptotic Equipartition Property (AEP). This treatment closely parallels the corresponding development in lossless compression, a point of view that was advanced in an important paper of Wyner and Ziv in 1989. In the lossless case we rev…
▽ More
We present a development of parts of rate-distortion theory and pattern- matching algorithms for lossy data compression, centered around a lossy version of the Asymptotic Equipartition Property (AEP). This treatment closely parallels the corresponding development in lossless compression, a point of view that was advanced in an important paper of Wyner and Ziv in 1989. In the lossless case we review how the AEP underlies the analysis of the Lempel-Ziv algorithm by viewing it as a random code and reducing it to the idealized Shannon code. This also provides information about the redundancy of the Lempel-Ziv algorithm and about the asymptotic behavior of several relevant quantities. In the lossy case we give various versions of the statement of the generalized AEP and we outline the general methodology of its proof via large deviations. Its relationship with Barron's generalized AEP is also discussed. The lossy AEP is applied to: (i) prove strengthened versions of Shannon's source coding theorem and universal coding theorems; (ii) characterize the performance of mismatched codebooks; (iii) analyze the performance of pattern- matching algorithms for lossy compression; (iv) determine the first order asymptotics of waiting times (with distortion) between stationary processes; (v) characterize the best achievable rate of weighted codebooks as an optimal sphere-covering exponent. We then present a refinement to the lossy AEP and use it to: (i) prove second order coding theorems; (ii) characterize which sources are easier to compress; (iii) determine the second order asymptotics of waiting times; (iv) determine the precise asymptotic behavior of longest match-lengths. Extensions to random fields are also given.
△ Less
Submitted 1 March, 2001;
originally announced March 2001.
-
Critical Behavior in Lossy Source Coding
Authors:
Amir Dembo,
Ioannis Kontoyiannis
Abstract:
The following critical phenomenon was recently discovered. When a memoryless source is compressed using a variable-length fixed-distortion code, the fastest convergence rate of the (pointwise) compression ratio to the optimal $R(D)$ bits/symbol is either $O(\sqrt{n})$ or $O(\log n)$. We show it is always $O(\sqrt{n})$, except for discrete, uniformly distributed sources.
The following critical phenomenon was recently discovered. When a memoryless source is compressed using a variable-length fixed-distortion code, the fastest convergence rate of the (pointwise) compression ratio to the optimal $R(D)$ bits/symbol is either $O(\sqrt{n})$ or $O(\log n)$. We show it is always $O(\sqrt{n})$, except for discrete, uniformly distributed sources.
△ Less
Submitted 1 September, 2000;
originally announced September 2000.
-
Efficient sphere-covering and converse measure concentration via generalized coding theorems
Authors:
Ioannis Kontoyiannis
Abstract:
Suppose A is a finite set equipped with a probability measure P and let M be a ``mass'' function on A. We give a probabilistic characterization of the most efficient way in which A^n can be almost-covered using spheres of a fixed radius. An almost-covering is a subset C_n of A^n, such that the union of the spheres centered at the points of C_n has probability close to one with respect to the pro…
▽ More
Suppose A is a finite set equipped with a probability measure P and let M be a ``mass'' function on A. We give a probabilistic characterization of the most efficient way in which A^n can be almost-covered using spheres of a fixed radius. An almost-covering is a subset C_n of A^n, such that the union of the spheres centered at the points of C_n has probability close to one with respect to the product measure P^n. An efficient covering is one with small mass M^n(C_n); n is typically large. With different choices for M and the geometry on A our results give various corollaries as special cases, including Shannon's data compression theorem, a version of Stein's lemma (in hypothesis testing), and a new converse to some measure concentration inequalities on discrete spaces. Under mild conditions, we generalize our results to abstract spaces and non-product measures.
△ Less
Submitted 27 September, 2000; v1 submitted 12 October, 1999;
originally announced October 1999.