-
Partial Information Decomposition for Continuous Variables based on Shared Exclusions: Analytical Formulation and Estimation
Authors:
David A. Ehrlich,
Kyle Schick-Poland,
Abdullah Makkeh,
Felix Lanfermann,
Patricia Wollstadt,
Michael Wibral
Abstract:
Describing statistical dependencies is foundational to empirical scientific research. For uncovering intricate and possibly non-linear dependencies between a single target variable and several source variables within a system, a principled and versatile framework can be found in the theory of Partial Information Decomposition (PID). Nevertheless, the majority of existing PID measures are restricte…
▽ More
Describing statistical dependencies is foundational to empirical scientific research. For uncovering intricate and possibly non-linear dependencies between a single target variable and several source variables within a system, a principled and versatile framework can be found in the theory of Partial Information Decomposition (PID). Nevertheless, the majority of existing PID measures are restricted to categorical variables, while many systems of interest in science are continuous. In this paper, we present a novel analytic formulation for continuous redundancy--a generalization of mutual information--drawing inspiration from the concept of shared exclusions in probability space as in the discrete PID definition of $I^\mathrm{sx}_\cap$. Furthermore, we introduce a nearest-neighbor based estimator for continuous PID, and showcase its effectiveness by applying it to a simulated energy management system provided by the Honda Research Institute Europe GmbH. This work bridges the gap between the measure-theoretically postulated existence proofs for a continuous $I^\mathrm{sx}_\cap$ and its practical application to real-world scientific problems.
△ Less
Submitted 27 March, 2024; v1 submitted 10 November, 2023;
originally announced November 2023.
-
A General Framework for Interpretable Neural Learning based on Local Information-Theoretic Goal Functions
Authors:
Abdullah Makkeh,
Marcel Graetz,
Andreas C. Schneider,
David A. Ehrlich,
Viola Priesemann,
Michael Wibral
Abstract:
Despite the impressive performance of biological and artificial networks, an intuitive understanding of how their local learning dynamics contribute to network-level task solutions remains a challenge to this date. Efforts to bring learning to a more local scale indeed lead to valuable insights, however, a general constructive approach to describe local learning goals that is both interpretable an…
▽ More
Despite the impressive performance of biological and artificial networks, an intuitive understanding of how their local learning dynamics contribute to network-level task solutions remains a challenge to this date. Efforts to bring learning to a more local scale indeed lead to valuable insights, however, a general constructive approach to describe local learning goals that is both interpretable and adaptable across diverse tasks is still missing. We have previously formulated a local information processing goal that is highly adaptable and interpretable for a model neuron with compartmental structure. Building on recent advances in Partial Information Decomposition (PID), we here derive a corresponding parametric local learning rule, which allows us to introduce 'infomorphic' neural networks. We demonstrate the versatility of these networks to perform tasks from supervised, unsupervised and memory learning. By leveraging the interpretable nature of the PID framework, infomorphic networks represent a valuable tool to advance our understanding of the intricate structure of local learning.
△ Less
Submitted 30 April, 2024; v1 submitted 3 June, 2023;
originally announced June 2023.
-
From Babel to Boole: The Logical Organization of Information Decompositions
Authors:
Aaron J. Gutknecht,
Abdullah Makkeh,
Michael Wibral
Abstract:
The conventional approach to the general Partial Information Decomposition (PID) problem has been redundancy-based: specifying a measure of redundant information between collections of source variables induces a PID via Moebius-Inversion over the so called redundancy lattice. Despite the prevalence of this method, there has been ongoing interest in examining the problem through the lens of differe…
▽ More
The conventional approach to the general Partial Information Decomposition (PID) problem has been redundancy-based: specifying a measure of redundant information between collections of source variables induces a PID via Moebius-Inversion over the so called redundancy lattice. Despite the prevalence of this method, there has been ongoing interest in examining the problem through the lens of different base-concepts of information, such as synergy, unique information, or union information. Yet, a comprehensive understanding of the logical organization of these different based-concepts and their associated PIDs remains elusive. In this work, we apply the mereological formulation of PID that we introduced in a recent paper to shed light on this problem. Within the mereological approach base-concepts can be expressed in terms of conditions phrased in formal logic on the specific parthood relations between the PID components and the different mutual information terms. We set forth a general pattern of these logical conditions of which all PID base-concepts in the literature are special cases and that also reveals novel base-concepts, in particular a concept we call "vulnerable information".
△ Less
Submitted 25 October, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
A Measure of the Complexity of Neural Representations based on Partial Information Decomposition
Authors:
David A. Ehrlich,
Andreas C. Schneider,
Viola Priesemann,
Michael Wibral,
Abdullah Makkeh
Abstract:
In neural networks, task-relevant information is represented jointly by groups of neurons. However, the specific way in which this mutual information about the classification label is distributed among the individual neurons is not well understood: While parts of it may only be obtainable from specific single neurons, other parts are carried redundantly or synergistically by multiple neurons. We s…
▽ More
In neural networks, task-relevant information is represented jointly by groups of neurons. However, the specific way in which this mutual information about the classification label is distributed among the individual neurons is not well understood: While parts of it may only be obtainable from specific single neurons, other parts are carried redundantly or synergistically by multiple neurons. We show how Partial Information Decomposition (PID), a recent extension of information theory, can disentangle these different contributions. From this, we introduce the measure of "Representational Complexity", which quantifies the difficulty of accessing information spread across multiple neurons. We show how this complexity is directly computable for smaller layers. For larger layers, we propose subsampling and coarse-graining procedures and prove corresponding bounds on the latter. Empirically, for quantized deep neural networks solving the MNIST and CIFAR10 tasks, we observe that representational complexity decreases both through successive hidden layers and over training, and compare the results to related measures. Overall, we propose representational complexity as a principled and interpretable summary statistic for analyzing the structure and evolution of neural representations and complex systems in general.
△ Less
Submitted 17 May, 2023; v1 submitted 21 September, 2022;
originally announced September 2022.
-
A partial information decomposition for discrete and continuous variables
Authors:
Kyle Schick-Poland,
Abdullah Makkeh,
Aaron J. Gutknecht,
Patricia Wollstadt,
Anja Sturm,
Michael Wibral
Abstract:
Conceptually, partial information decomposition (PID) is concerned with separating the information contributions several sources hold about a certain target by decomposing the corresponding joint mutual information into contributions such as synergistic, redundant, or unique information. Despite PID conceptually being defined for any type of random variables, so far, PID could only be quantified f…
▽ More
Conceptually, partial information decomposition (PID) is concerned with separating the information contributions several sources hold about a certain target by decomposing the corresponding joint mutual information into contributions such as synergistic, redundant, or unique information. Despite PID conceptually being defined for any type of random variables, so far, PID could only be quantified for the joint mutual information of discrete systems. Recently, a quantification for PID in continuous settings for two or three source variables was introduced. Nonetheless, no ansatz has managed to both quantify PID for more than three variables and cover general measure-theoretic random variables, such as mixed discrete-continuous, or continuous random variables yet. In this work we will propose an information quantity, defining the terms of a PID, which is well-defined for any number or type of source or target random variable. This proposed quantity is tightly related to a recently developed local shared information quantity for discrete random variables based on the idea of shared exclusions. Further, we prove that this newly proposed information-measure fulfills various desirable properties, such as satisfying a set of local PID axioms, invariance under invertible transformations, differentiability with respect to the underlying probability density, and admitting a target chain rule.
△ Less
Submitted 24 June, 2021; v1 submitted 23 June, 2021;
originally announced June 2021.
-
Estimating the Unique Information of Continuous Variables
Authors:
Ari Pakman,
Amin Nejatbakhsh,
Dar Gilboa,
Abdullah Makkeh,
Luca Mazzucato,
Michael Wibral,
Elad Schneidman
Abstract:
The integration and transfer of information from multiple sources to multiple targets is a core motive of neural systems. The emerging field of partial information decomposition (PID) provides a novel information-theoretic lens into these mechanisms by identifying synergistic, redundant, and unique contributions to the mutual information between one and several variables. While many works have stu…
▽ More
The integration and transfer of information from multiple sources to multiple targets is a core motive of neural systems. The emerging field of partial information decomposition (PID) provides a novel information-theoretic lens into these mechanisms by identifying synergistic, redundant, and unique contributions to the mutual information between one and several variables. While many works have studied aspects of PID for Gaussian and discrete distributions, the case of general continuous distributions is still uncharted territory. In this work we present a method for estimating the unique information in continuous distributions, for the case of one versus two variables. Our method solves the associated optimization problem over the space of distributions with fixed bivariate marginals by combining copula decompositions and techniques developed to optimize variational autoencoders. We obtain excellent agreement with known analytic results for Gaussians, and illustrate the power of our new approach in several brain-inspired neural models. Our method is capable of recovering the effective connectivity of a chaotic network of rate neurons, and uncovers a complex trade-off between redundancy, synergy and unique information in recurrent networks trained to solve a generalized XOR task.
△ Less
Submitted 26 October, 2021; v1 submitted 30 January, 2021;
originally announced February 2021.
-
Partial Information Decomposition of Boolean Functions: a Fourier Analysis perspective
Authors:
Abdullah Makkeh,
Dirk Oliver Theis,
Raul Vicente
Abstract:
Partial information decomposition (PID) partitions the information that a set of sources has about a target variable into synergistic, unique, and redundant contributions. This information-theoretic tool has recently attracted attention due to its potential to characterize the information processing in multivariate systems. However, the PID framework still lacks a solid and intuitive interpretatio…
▽ More
Partial information decomposition (PID) partitions the information that a set of sources has about a target variable into synergistic, unique, and redundant contributions. This information-theoretic tool has recently attracted attention due to its potential to characterize the information processing in multivariate systems. However, the PID framework still lacks a solid and intuitive interpretation of its information components. In the aim to improve the understanding of PID components, we focus here on Boolean gates, a much-studied type of source-target mechanisms. Boolean gates have been extensively characterised via Fourier analysis which coefficients have been related to interesting properties of the functions defining the gates. In this paper, we establish for Boolean gates mechanisms a relation between their PID components and Fourier coefficients.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
Bits and Pieces: Understanding Information Decomposition from Part-whole Relationships and Formal Logic
Authors:
Aaron J. Gutknecht,
Michael Wibral,
Abdullah Makkeh
Abstract:
Partial information decomposition (PID) seeks to decompose the multivariate mutual information that a set of source variables contains about a target variable into basic pieces, the so called "atoms of information". Each atom describes a distinct way in which the sources may contain information about the target. In this paper we show, first, that the entire theory of partial information decomposit…
▽ More
Partial information decomposition (PID) seeks to decompose the multivariate mutual information that a set of source variables contains about a target variable into basic pieces, the so called "atoms of information". Each atom describes a distinct way in which the sources may contain information about the target. In this paper we show, first, that the entire theory of partial information decomposition can be derived from considerations of elementary parthood relationships between information contributions. This way of approaching the problem has the advantage of directly characterizing the atoms of information, instead of taking an indirect approach via the concept of redundancy. Secondly, we describe several intriguing links between PID and formal logic. In particular, we show how to define a measure of PID based on the information provided by certain statements about source realizations. Furthermore, we show how the mathematical lattice structure underlying PID theory can be translated into an isomorphic structure of logical statements with a particularly simple ordering relation: logical implication. The conclusion to be drawn from these considerations is that there are three isomorphic "worlds" of partial information decomposition, i.e. three equivalent ways to mathematically describe the decomposition of the information carried by a set of sources about a target: the world of parthood relationships, the world of logical statements, and the world of antichains that was utilized by Williams and Beer in their original exposition of PID theory. We additionally show how the parthood perspective provides a systematic way to answer a type of question that has been much discussed in the PID field: whether a partial information decomposition can be uniquely determined based on concepts other than redundant information.
△ Less
Submitted 7 March, 2022; v1 submitted 21 August, 2020;
originally announced August 2020.
-
Introducing a differentiable measure of pointwise shared information
Authors:
Abdullah Makkeh,
Aaron J. Gutknecht,
Michael Wibral
Abstract:
Partial information decomposition (PID) of the multivariate mutual information describes the distinct ways in which a set of source variables contains information about a target variable. The groundbreaking work of Williams and Beer has shown that this decomposition cannot be determined from classic information theory without making additional assumptions, and several candidate measures have been…
▽ More
Partial information decomposition (PID) of the multivariate mutual information describes the distinct ways in which a set of source variables contains information about a target variable. The groundbreaking work of Williams and Beer has shown that this decomposition cannot be determined from classic information theory without making additional assumptions, and several candidate measures have been proposed, often drawing on principles from related fields such as decision theory. None of these measures is differentiable with respect to the underlying probability mass function. We here present a novel measure that satisfies this property, emerges solely from information-theoretic principles, and has the form of a local mutual information. We show how the measure can be understood from the perspective of exclusions of probability mass, a principle that is foundational to the original definition of the mutual information by Fano. Since our measure is well-defined for individual realizations of the random variables it lends itself for example to local learning in artificial neural networks. We also show that it has a meaningful Möbius inversion on a redundancy lattice and obeys a target chain rule. We give an operational interpretation of the measure based on the decisions that an agent should take if given only the shared information.
△ Less
Submitted 30 March, 2021; v1 submitted 9 February, 2020;
originally announced February 2020.
-
MAXENT3D_PID: An Estimator for the Maximum-entropy Trivariate Partial Information Decomposition
Authors:
Abdullah Makkeh,
Daniel Chicharro,
Dirk Oliver Theis,
Raul Vicente
Abstract:
Chicharro (2017) introduced a procedure to determine multivariate partial information measures within the maximum entropy framework, separating unique, redundant, and synergistic components of information. Makkeh, Theis, and Vicente (2018) formulated the latter trivariate partial information measure as Cone Programming. In this paper, we present MAXENT3D_PID, a production-quality software that com…
▽ More
Chicharro (2017) introduced a procedure to determine multivariate partial information measures within the maximum entropy framework, separating unique, redundant, and synergistic components of information. Makkeh, Theis, and Vicente (2018) formulated the latter trivariate partial information measure as Cone Programming. In this paper, we present MAXENT3D_PID, a production-quality software that computes the trivariate partial information measure based on the Cone Programming model. We describe in detail our software, explain how to use it, and perform some experiments reflecting its accuracy in estimating the trivariate partial information decomposition.
△ Less
Submitted 10 January, 2019;
originally announced January 2019.
-
Optimizing Bivariate Partial Information Decomposition
Authors:
Abdullah Makkeh,
Dirk Oliver Theis
Abstract:
None of the BROJA information decomposition measures $\mbox{SI}, \mbox{CI}, \mbox{UIy}, \mbox{UIz}$ are convex or concave over the probability simplex. In this paper, we provide formulas for the sub-gradient and super-gradients of any of the information decomposition measures. Then we apply these results to obtain an optimum of some of these information decomposition measures when optimized over a…
▽ More
None of the BROJA information decomposition measures $\mbox{SI}, \mbox{CI}, \mbox{UIy}, \mbox{UIz}$ are convex or concave over the probability simplex. In this paper, we provide formulas for the sub-gradient and super-gradients of any of the information decomposition measures. Then we apply these results to obtain an optimum of some of these information decomposition measures when optimized over a constrained set of probability distributions.
△ Less
Submitted 12 February, 2018;
originally announced February 2018.
-
BROJA-2PID: A robust estimator for bivariate partial information decomposition
Authors:
Abdullah Makkeh,
Dirk Oliver Theis,
Raul Vicente
Abstract:
Makkeh, Theis, and Vicente found in [8] that Cone Programming model is the most robust to compute the Bertschinger et al. partial information decompostion (BROJA PID) measure [1]. We developed a production-quality robust software that computes the BROJA PID measure based on the Cone Programming model. In this paper, we prove the important property of strong duality for the Cone Program and prove a…
▽ More
Makkeh, Theis, and Vicente found in [8] that Cone Programming model is the most robust to compute the Bertschinger et al. partial information decompostion (BROJA PID) measure [1]. We developed a production-quality robust software that computes the BROJA PID measure based on the Cone Programming model. In this paper, we prove the important property of strong duality for the Cone Program and prove an equivalence between the Cone Program and the original Convex problem. Then describe in detail our software and how to use it.\newline\indent
△ Less
Submitted 7 February, 2018;
originally announced February 2018.
-
On the Graph of the Pedigree Polytope
Authors:
Abdullah Makkeh,
Mozhgan Pourmoradnasseri,
Dirk Oliver Theis
Abstract:
Pedigree polytopes are extensions of the classical Symmetric Traveling Salesman Problem polytopes whose graphs (1-skeletons) contain the TSP polytope graphs as spanning subgraphs. While deciding adjacency of vertices in TSP polytopes is coNP-complete, Arthanari has given a combinatorial (polynomially decidable) characterization of adjacency in Pedigree polytopes. Based on this characterization, we…
▽ More
Pedigree polytopes are extensions of the classical Symmetric Traveling Salesman Problem polytopes whose graphs (1-skeletons) contain the TSP polytope graphs as spanning subgraphs. While deciding adjacency of vertices in TSP polytopes is coNP-complete, Arthanari has given a combinatorial (polynomially decidable) characterization of adjacency in Pedigree polytopes. Based on this characterization, we study the graphs of Pedigree polytopes asymptotically, for large numbers of cities. Unlike TSP polytope graphs, which are vertex transitive, Pedigree graphs are not even regular. Using an "adjacency game" to handle Arthanari's intricate inductive characterization of adjacency, we prove that the minimum degree is asymptotically equal to the number of vertices, i.e., the graph is "asymptotically almost complete".
△ Less
Submitted 25 November, 2016;
originally announced November 2016.
-
The Graph of the Pedigree Polytope is Asymptotically Almost Complete (Extended Abstract)
Authors:
Abdullah Makkeh,
Mozhgan Pourmoradnasseri,
Dirk Oliver Theis
Abstract:
Graphs (1-skeletons) of Traveling-Salesman-related polytopes have attracted a lot of attention. Pedigree polytopes are extensions of the classical Symmetric Traveling Salesman Problem polytopes (Arthanari 2000) whose graphs contain the TSP polytope graphs as spanning subgraphs (Arthanari 2013). Unlike TSP polytopes, Pedigree polytopes are not "symmetric", e.g., their graphs are not vertex transiti…
▽ More
Graphs (1-skeletons) of Traveling-Salesman-related polytopes have attracted a lot of attention. Pedigree polytopes are extensions of the classical Symmetric Traveling Salesman Problem polytopes (Arthanari 2000) whose graphs contain the TSP polytope graphs as spanning subgraphs (Arthanari 2013). Unlike TSP polytopes, Pedigree polytopes are not "symmetric", e.g., their graphs are not vertex transitive, not even regular. We show that in the graph of the pedigree polytope, the quotient minimum degree over number of vertices tends to 1 as the number of cities tends to infinity.
△ Less
Submitted 28 November, 2016; v1 submitted 25 November, 2016;
originally announced November 2016.