Search | arXiv e-print repository

Deeper Interpretability of Deep Networks

Authors: Tian Xu, Jiayu Zhan, Oliver G. B. Garrod, Philip H. S. Torr, Song-Chun Zhu, Robin A. A. Ince, Philippe G. Schyns

Abstract: Deep Convolutional Neural Networks (CNNs) have been one of the most influential recent developments in computer vision, particularly for categorization. There is an increasing demand for explainable AI as these systems are deployed in the real world. However, understanding the information represented and processed in CNNs remains in most cases challenging. Within this paper, we explore the use of… ▽ More Deep Convolutional Neural Networks (CNNs) have been one of the most influential recent developments in computer vision, particularly for categorization. There is an increasing demand for explainable AI as these systems are deployed in the real world. However, understanding the information represented and processed in CNNs remains in most cases challenging. Within this paper, we explore the use of new information theoretic techniques developed in the field of neuroscience to enable novel understanding of how a CNN represents information. We trained a 10-layer ResNet architecture to identify 2,000 face identities from 26M images generated using a rigorously controlled 3D face rendering model that produced variations of intrinsic (i.e. face morphology, gender, age, expression and ethnicity) and extrinsic factors (i.e. 3D pose, illumination, scale and 2D translation). With our methodology, we demonstrate that unlike human's network overgeneralizes face identities even with extreme changes of face shape, but it is more sensitive to changes of texture. To understand the processing of information underlying these counterintuitive properties, we visualize the features of shape and texture that the network processes to identify faces. Then, we shed a light into the inner workings of the black box and reveal how hidden layers represent these features and whether the representations are invariant to pose. We hope that our methodology will provide an additional valuable tool for interpretability of CNNs. △ Less

Submitted 20 November, 2018; v1 submitted 19 November, 2018; originally announced November 2018.

arXiv:1803.02030 [pdf, other]

doi 10.3390/e20040240

Exact partial information decompositions for Gaussian systems based on dependency constraints

Authors: James W. Kay, Robin A. A. Ince

Abstract: The Partial Information Decomposition (PID) [arXiv:1004.2515] provides a theoretical framework to characterize and quantify the structure of multivariate information sharing. A new method (Idep) has recently been proposed for computing a two-predictor PID over discrete spaces. [arXiv:1709.06653] A lattice of maximum entropy probability models is constructed based on marginal dependency constraints… ▽ More The Partial Information Decomposition (PID) [arXiv:1004.2515] provides a theoretical framework to characterize and quantify the structure of multivariate information sharing. A new method (Idep) has recently been proposed for computing a two-predictor PID over discrete spaces. [arXiv:1709.06653] A lattice of maximum entropy probability models is constructed based on marginal dependency constraints, and the unique information that a particular predictor has about the target is defined as the minimum increase in joint predictor-target mutual information when that particular predictor-target marginal dependency is constrained. Here, we apply the Idep approach to Gaussian systems, for which the marginally constrained maximum entropy models are Gaussian graphical models. Closed form solutions for the Idep PID are derived for both univariate and multivariate Gaussian systems. Numerical and graphical illustrations are provided, together with practical and theoretical comparisons of the Idep PID with the minimum mutual information PID (Immi). [arXiv:1411.2832] In particular, it is proved that the Immi method generally produces larger estimates of redundancy and synergy than does the Idep method. In discussion of the practical examples, the PIDs are complemented by the use of deviance tests for the comparison of Gaussian graphical models. △ Less

Submitted 6 March, 2018; originally announced March 2018.

Comments: 39 pages, 9 figures, 9 tables

Journal ref: Entropy 2018, 20(4), 240

arXiv:1702.01591 [pdf, other]

The Partial Entropy Decomposition: Decomposing multivariate entropy and mutual information via pointwise common surprisal

Authors: Robin A. A. Ince

Abstract: Obtaining meaningful quantitative descriptions of the statistical dependence within multivariate systems is a difficult open problem. Recently, the Partial Information Decomposition (PID) was proposed to decompose mutual information (MI) about a target variable into components which are redundant, unique and synergistic within different subsets of predictor variables. Here, we propose to apply the… ▽ More Obtaining meaningful quantitative descriptions of the statistical dependence within multivariate systems is a difficult open problem. Recently, the Partial Information Decomposition (PID) was proposed to decompose mutual information (MI) about a target variable into components which are redundant, unique and synergistic within different subsets of predictor variables. Here, we propose to apply the elegant formalism of the PID to multivariate entropy, resulting in a Partial Entropy Decomposition (PED). We implement the PED with an entropy redundancy measure based on pointwise common surprisal; a natural definition which is closely related to the definition of MI. We show how this approach can reveal the dyadic vs triadic generative structure of multivariate systems that are indistinguishable with classical Shannon measures. The entropy perspective also shows that misinformation is synergistic entropy and hence that MI itself includes both redundant and synergistic effects. We show the relationships between the PED and MI in two predictors, and derive two alternative information decompositions which we illustrate on several example systems. This reveals that in entropy terms, univariate predictor MI is not a proper subset of the joint MI, and we suggest this previously unrecognised fact explains in part why obtaining a consistent PID has proven difficult. The PED also allows separate quantification of mechanistic redundancy (related to the function of the system) versus source redundancy (arising from dependencies between inputs); an important distinction which no existing methods can address. The new perspective provided by the PED helps to clarify some of the difficulties encountered with the PID approach and the resulting decompositions provide useful tools for practical data analysis across a wide range of application areas. △ Less

Submitted 20 February, 2017; v1 submitted 6 February, 2017; originally announced February 2017.

Comments: Added Section 3.7 (Quantifying source vs mechanistic redundancy) and Section 3.8 (Shared entropy as a measure of dependence: pure mutual information) and updated abstract, results, and discussion accordingly

arXiv:1602.05063 [pdf, other]

doi 10.3390/e19070318

Measuring multivariate redundant information with pointwise common change in surprisal

Authors: Robin A. A. Ince

Abstract: The problem of how to properly quantify redundant information is an open question that has been the subject of much recent research. Redundant information refers to information about a target variable S that is common to two or more predictor variables ** information content or similarities in the representation of S between the Xi. We present a new… ▽ More The problem of how to properly quantify redundant information is an open question that has been the subject of much recent research. Redundant information refers to information about a target variable S that is common to two or more predictor variables ** information content or similarities in the representation of S between the Xi. We present a new measure of redundancy which measures the common change in surprisal shared between variables at the local or pointwise level. We provide a game-theoretic operational definition of unique information, and use this to derive constraints which are used to obtain a maximum entropy distribution. Redundancy is then calculated from this maximum entropy distribution by counting only those local co-information terms which admit an unambiguous interpretation as redundant information. We show how this redundancy measure can be used within the framework of the Partial Information Decomposition (PID) to give an intuitive decomposition of the multivariate mutual information into redundant, unique and synergistic contributions. We compare our new measure to existing approaches over a range of example systems, including continuous Gaussian variables. Matlab code for the measure is provided, including all considered examples. △ Less

Submitted 13 July, 2017; v1 submitted 16 February, 2016; originally announced February 2016.

Comments: v3: revisions based on review process at Entropy (expand game-theory and max-ent motivation), v2: add game-theoretic operational definition for maximum entropy constraints; remove thresholding and normalisation of values on lattice

Journal ref: Entropy 2017, 19(7), 318

Showing 1–4 of 4 results for author: Ince, R A A