-
The Group Theoretic Roots of Information: permutations, symmetry, and entropy
Authors:
David J. Galas
Abstract:
We propose a new interpretation of measures of information and disorder by connecting these concepts to group theory in a new way. Entropy and group theory are connected here by their common relation to sets of permutations. A combinatorial measure of information and disorder is proposed, in terms of integers and discrete functions, that we call the integer entropy. The Shannon measure of informat…
▽ More
We propose a new interpretation of measures of information and disorder by connecting these concepts to group theory in a new way. Entropy and group theory are connected here by their common relation to sets of permutations. A combinatorial measure of information and disorder is proposed, in terms of integers and discrete functions, that we call the integer entropy. The Shannon measure of information is the limiting case of a richer, more general conceptual structure that reveals relations among finite groups, information, and symmetries. It is shown that the integer entropy converges uniformly to the Shannon entropy when the group includes all permutations, the Symmetric group, and the number of objects increases without bound. The harmonic numbers have a well-known combinatorial meaning as the expected number of disjoint, non-empty cycles in permutations of n objects, and since integer entropy is defined in terms of the expected value of the number of cycles over the set of permutations, it also has a clear combinatorial meaning. Since all finite groups are isomorphic to subgroups of the Symmetric group, every finite group has a corresponding information functional, analogous to the Shannon entropy and a number series analogous to the harmonic numbers. The Cameron-Semeraro cycle polynomial is used to analyze the integer entropy for finite groups, and to characterize the series analogous to the Harmonic numbers. We introduce and use a reciprocal polynomial, the transposition polynomial that provides an additional tool and new insights. Broken symmetries and conserved quantities are linked through the cycle and transposition properties of the groups, and can be used to generalize the analysis of stochastic processes.
△ Less
Submitted 21 November, 2019; v1 submitted 23 August, 2019;
originally announced August 2019.
-
Expansion of the Kullback-Leibler Divergence, and a new class of information metrics
Authors:
David J. Galas,
T. Gregory Dewey,
James Kunert-Graf,
Nikita A. Sakhanenko
Abstract:
Inferring and comparing complex, multivariable probability density functions is fundamental to problems in several fields, including probabilistic learning, network theory, and data analysis. Classification and prediction are the two faces of this class of problem. We take an approach here that simplifies many aspects of these problems by presenting a structured, series expansion of the Kullback-L…
▽ More
Inferring and comparing complex, multivariable probability density functions is fundamental to problems in several fields, including probabilistic learning, network theory, and data analysis. Classification and prediction are the two faces of this class of problem. We take an approach here that simplifies many aspects of these problems by presenting a structured, series expansion of the Kullback-Leibler divergence - a function central to information theory - and devise a distance metric based on this divergence. Using the Möbius inversion duality between multivariable entropies and multivariable interaction information, we express the divergence as an additive series in the number of interacting variables, which provides a restricted and simplified set of distributions to use as approximation and with which to model data. Truncations of this series yield approximations based on the number of interacting variables. The first few terms of the expansion-truncation are illustrated and shown to lead naturally to familiar approximations, including the well-known Kirkwood superposition approximation. Truncation can also induce a simple relation between the multi-information and the interaction information. A measure of distance between distributions, based on Kullback-Leibler divergence, is then described and shown to be a true metric if properly restricted. The expansion is shown to generate a hierarchy of metrics and connects this work to information geometry formalisms. We give an example of the application of these metrics to a graph comparision problem that shows that the formalism can be applied to a wide range of network problems, provides a general approach for systematic approximations in numbers of interactions or connections, and a related quantitative metric.
△ Less
Submitted 29 March, 2017; v1 submitted 31 January, 2017;
originally announced February 2017.
-
Multivariate information measures: a unification using Möbius operators on subset lattices
Authors:
David J. Galas,
Nikita A. Sakhanenko
Abstract:
Information related measures are useful tools for multi variable data analysis, as measures of dependence among variables, and as descriptions of order in biological and physical systems. Information related measures, like marginal entropies, mutual / interaction / multi-information, have been used in a number of fields including descriptions of systems complexity and biological data analysis. The…
▽ More
Information related measures are useful tools for multi variable data analysis, as measures of dependence among variables, and as descriptions of order in biological and physical systems. Information related measures, like marginal entropies, mutual / interaction / multi-information, have been used in a number of fields including descriptions of systems complexity and biological data analysis. The mathematical relationships among these measures are therefore of significant interest. Relations between common information measures include the duality relations based on Möbius inversion on lattices. These are the direct consequence of the symmetries of the lattices of the sets of variables (subsets ordered by inclusion). While the mathematical properties and relationships among these information-related measures are of significant interest, there has been, to our knowledge, no systematic examination of the full range of relationships and no unification of this diverse range of functions into a single formalism as we do here. In this paper we define operators on functions on these lattices based on the Möbius inversion idea that map the functions into one another (Möbius operators.) We show that these operators form a simple group isomorphic to the symmetric group S3. Relations among the set of functions on the lattice are transparently expressed in terms of the operator algebra, and, applied to the information measures, can be used to derive a wide range of relationships among measures. We describe a direct relation between sums of conditional log-likelihoods and previously defined dependency measures. The algebra is naturally generalized which yields more extensive relationships. This formalism provides a fundamental unification of information related measures, but isomorphism of all distributive lattices with the subset lattice implies broad potential application of these results.
△ Less
Submitted 26 September, 2016; v1 submitted 25 January, 2016;
originally announced January 2016.
-
On Lattices and the Dualities of Information Measures
Authors:
David J. Galas,
Nikita A. Sakhanenko,
Benjamin Keller
Abstract:
Measures of dependence among variables, and measures of information content and shared information have become valuable tools of multi-variable data analysis. Information measures, like marginal entropies, mutual and multi-information, have a number of significant advantages over more standard statistical methods, like their reduced sensitivity to sampling limitations than statistical estimates of…
▽ More
Measures of dependence among variables, and measures of information content and shared information have become valuable tools of multi-variable data analysis. Information measures, like marginal entropies, mutual and multi-information, have a number of significant advantages over more standard statistical methods, like their reduced sensitivity to sampling limitations than statistical estimates of probability densities. There are also interesting applications of these measures to the theory of complexity and to statistical mechanics. Their mathematical properties and relationships are therefore of interest at several levels.
Of the interesting relationships between common information measures, perhaps none are more intriguing and as elegant as the duality relationships based on Mobius inversions. These inversions are directly related to the lattices (posets) that describe these sets of variables and their multi-variable measures. In this paper we describe extensions of the duality previously noted by Bell to a range of measures, and show how the structure of the lattice determines fundamental relationships of these functions. Our major result is a set of interlinked duality relations among marginal entropies, interaction information, and conditional interaction information. The implications of these results include a flexible range of alternative formulations of information-based measures, and a new set of sum rules that arise from path-independent sums on the lattice. Our motivation is to advance the fundamental integration of this set of ideas and relations, and to show explicitly the ways in which all these measures are interrelated through lattice properties. These ideas can be useful in constructing theories of complexity, descriptions of large scale stochastic processes and systems, and in devising algorithms and approximations for computations in multi-variable data analysis.
△ Less
Submitted 31 July, 2013;
originally announced August 2013.
-
Describing the complexity of systems: multi-variable "set complexity" and the information basis of systems biology
Authors:
David J. Galas,
Nikita A. Sakhanenko,
Alexander Skupin,
Tomasz Ignac
Abstract:
Context dependence is central to the description of complexity. Keying on the pairwise definition of "set complexity" we use an information theory approach to formulate general measures of systems complexity. We examine the properties of multi-variable dependency starting with the concept of interaction information. We then present a new measure for unbiased detection of multi-variable dependency,…
▽ More
Context dependence is central to the description of complexity. Keying on the pairwise definition of "set complexity" we use an information theory approach to formulate general measures of systems complexity. We examine the properties of multi-variable dependency starting with the concept of interaction information. We then present a new measure for unbiased detection of multi-variable dependency, "differential interaction information." This quantity for two variables reduces to the pairwise "set complexity" previously proposed as a context-dependent measure of information in biological systems. We generalize it here to an arbitrary number of variables. Critical limiting properties of the "differential interaction information" are key to the generalization. This measure extends previous ideas about biological information and provides a more sophisticated basis for study of complexity. The properties of "differential interaction information" also suggest new approaches to data analysis. Given a data set of system measurements differential interaction information can provide a measure of collective dependence, which can be represented in hypergraphs describing complex system interaction patterns. We investigate this kind of analysis using simulated data sets. The conjoining of a generalized set complexity measure, multi-variable dependency analysis, and hypergraphs is our central result. While our focus is on complex biological systems, our results are applicable to any complex system.
△ Less
Submitted 19 August, 2013; v1 submitted 27 February, 2013;
originally announced February 2013.
-
Set-based complexity and biological information
Authors:
David J. Galas,
Matti Nykter,
Gregory W. Carter,
Nathan D. Price,
Ilya Shmulevich
Abstract:
It is not obvious what fraction of all the potential information residing in the molecules and structures of living systems is significant or meaningful to the system. Sets of random sequences or identically repeated sequences, for example, would be expected to contribute little or no useful information to a cell. This issue of quantitation of information is important since the ebb and flow of b…
▽ More
It is not obvious what fraction of all the potential information residing in the molecules and structures of living systems is significant or meaningful to the system. Sets of random sequences or identically repeated sequences, for example, would be expected to contribute little or no useful information to a cell. This issue of quantitation of information is important since the ebb and flow of biologically significant information is essential to our quantitative understanding of biological function and evolution. Motivated specifically by these problems of biological information, we propose here a class of measures to quantify the contextual nature of the information in sets of objects, based on Kolmogorov's intrinsic complexity. Such measures discount both random and redundant information and are inherent in that they do not require a defined state space to quantify the information. The maximization of this new measure, which can be formulated in terms of the universal information distance, appears to have several useful and interesting properties, some of which we illustrate with examples.
△ Less
Submitted 25 January, 2008;
originally announced January 2008.