-
On Functions of Markov Random Fields
Authors:
Bernhard C. Geiger,
Ali Al-Bashabsheh
Abstract:
We derive two sufficient conditions for a function of a Markov random field (MRF) on a given graph to be a MRF on the same graph. The first condition is information-theoretic and parallels a recent information-theoretic characterization of lumpability of Markov chains. The second condition, which is easier to check, is based on the potential functions of the corresponding Gibbs field. We illustrat…
▽ More
We derive two sufficient conditions for a function of a Markov random field (MRF) on a given graph to be a MRF on the same graph. The first condition is information-theoretic and parallels a recent information-theoretic characterization of lumpability of Markov chains. The second condition, which is easier to check, is based on the potential functions of the corresponding Gibbs field. We illustrate our sufficient conditions at the hand of several examples and discuss implications for practical applications of MRFs. As a side result, we give a partial characterization of functions of MRFs that are information-preserving.
△ Less
Submitted 15 October, 2020; v1 submitted 28 May, 2020;
originally announced May 2020.
-
Aggregated Learning: A Vector-Quantization Approach to Learning Neural Network Classifiers
Authors:
Masoumeh Soflaei,
Hongyu Guo,
Ali Al-Bashabsheh,
Yongyi Mao,
Richong Zhang
Abstract:
We consider the problem of learning a neural network classifier. Under the information bottleneck (IB) principle, we associate with this classification problem a representation learning problem, which we call "IB learning". We show that IB learning is, in fact, equivalent to a special class of the quantization problem. The classical results in rate-distortion theory then suggest that IB learning c…
▽ More
We consider the problem of learning a neural network classifier. Under the information bottleneck (IB) principle, we associate with this classification problem a representation learning problem, which we call "IB learning". We show that IB learning is, in fact, equivalent to a special class of the quantization problem. The classical results in rate-distortion theory then suggest that IB learning can benefit from a "vector quantization" approach, namely, simultaneously learning the representations of multiple input objects. Such an approach assisted with some variational techniques, result in a novel learning framework, "Aggregated Learning", for classification with neural network models. In this framework, several objects are jointly classified by a single neural network. The effectiveness of this framework is verified through extensive experiments on standard image recognition and text classification tasks.
△ Less
Submitted 1 June, 2021; v1 submitted 12 January, 2020;
originally announced January 2020.
-
Neural Entropic Estimation: A faster path to mutual information estimation
Authors:
Chung Chan,
Ali Al-Bashabsheh,
Hing Pang Huang,
Michael Lim,
Da Sun Handason Tam,
Chao Zhao
Abstract:
We point out a limitation of the mutual information neural estimation (MINE) where the network fails to learn at the initial training phase, leading to slow convergence in the number of training iterations. To solve this problem, we propose a faster method called the mutual information neural entropic estimation (MI-NEE). Our solution first generalizes MINE to estimate the entropy using a custom r…
▽ More
We point out a limitation of the mutual information neural estimation (MINE) where the network fails to learn at the initial training phase, leading to slow convergence in the number of training iterations. To solve this problem, we propose a faster method called the mutual information neural entropic estimation (MI-NEE). Our solution first generalizes MINE to estimate the entropy using a custom reference distribution. The entropy estimate can then be used to estimate the mutual information. We argue that the seemingly redundant intermediate step of entropy estimation allows one to improve the convergence by an appropriate reference distribution. In particular, we show that MI-NEE reduces to MINE in the special case when the reference distribution is the product of marginal distributions, but faster convergence is possible by choosing the uniform distribution as the reference distribution instead. Compared to the product of marginals, the uniform distribution introduces more samples in low-density regions and fewer samples in high-density regions, which appear to lead to an overall larger gradient for faster convergence.
△ Less
Submitted 30 May, 2019; v1 submitted 30 May, 2019;
originally announced May 2019.
-
Aggregated Learning: A Deep Learning Framework Based on Information-Bottleneck Vector Quantization
Authors:
Hongyu Guo,
Yongyi Mao,
Ali Al-Bashabsheh,
Richong Zhang
Abstract:
Based on the notion of information bottleneck (IB), we formulate a quantization problem called "IB quantization". We show that IB quantization is equivalent to learning based on the IB principle. Under this equivalence, the standard neural network models can be viewed as scalar (single sample) IB quantizers. It is known, from conventional rate-distortion theory, that scalar quantizers are inferior…
▽ More
Based on the notion of information bottleneck (IB), we formulate a quantization problem called "IB quantization". We show that IB quantization is equivalent to learning based on the IB principle. Under this equivalence, the standard neural network models can be viewed as scalar (single sample) IB quantizers. It is known, from conventional rate-distortion theory, that scalar quantizers are inferior to vector (multi-sample) quantizers. Such a deficiency then inspires us to develop a novel learning framework, AgrLearn, that corresponds to vector IB quantizers for learning with neural networks. Unlike standard networks, AgrLearn simultaneously optimizes against multiple data samples. We experimentally verify that AgrLearn can result in significant improvements when applied to several current deep learning architectures for image recognition and text classification. We also empirically show that AgrLearn can reduce up to 80% of the training samples needed for ResNet training.
△ Less
Submitted 11 February, 2019; v1 submitted 26 July, 2018;
originally announced July 2018.
-
Info-Clustering: An Efficient Algorithm by Network Information Flow
Authors:
Chung Chan,
Ali Al-Bashabsheh,
Qiaoqiao Zhou
Abstract:
Motivated by the fact that entities in a social network or biological system often interact by exchanging information, we propose an efficient info-clustering algorithm that can group entities into communities using a parametric max-flow algorithm. This is a meaningful special case of the info-clustering paradigm where the dependency structure is graphical and can be learned readily from data.
Motivated by the fact that entities in a social network or biological system often interact by exchanging information, we propose an efficient info-clustering algorithm that can group entities into communities using a parametric max-flow algorithm. This is a meaningful special case of the info-clustering paradigm where the dependency structure is graphical and can be learned readily from data.
△ Less
Submitted 31 January, 2017;
originally announced February 2017.
-
Agglomerative Info-Clustering
Authors:
Chung Chan,
Ali Al-Bashabsheh,
Qiaoqiao Zhou
Abstract:
An agglomerative clustering of random variables is proposed, where clusters of random variables sharing the maximum amount of multivariate mutual information are merged successively to form larger clusters. Compared to the previous info-clustering algorithms, the agglomerative approach allows the computation to stop earlier when clusters of desired size and accuracy are obtained. An efficient algo…
▽ More
An agglomerative clustering of random variables is proposed, where clusters of random variables sharing the maximum amount of multivariate mutual information are merged successively to form larger clusters. Compared to the previous info-clustering algorithms, the agglomerative approach allows the computation to stop earlier when clusters of desired size and accuracy are obtained. An efficient algorithm is also derived based on the submodularity of entropy and the duality between the principal sequence of partitions and the principal sequence for submodular functions.
△ Less
Submitted 24 February, 2017; v1 submitted 17 January, 2017;
originally announced January 2017.
-
Duality between Feature Selection and Data Clustering
Authors:
Chung Chan,
Ali Al-Bashabsheh,
Qiaoqiao Zhou,
Tie Liu
Abstract:
The feature-selection problem is formulated from an information-theoretic perspective. We show that the problem can be efficiently solved by an extension of the recently proposed info-clustering paradigm. This reveals the fundamental duality between feature selection and data clustering,which is a consequence of the more general duality between the principal partition and the principal lattice of…
▽ More
The feature-selection problem is formulated from an information-theoretic perspective. We show that the problem can be efficiently solved by an extension of the recently proposed info-clustering paradigm. This reveals the fundamental duality between feature selection and data clustering,which is a consequence of the more general duality between the principal partition and the principal lattice of partitions in combinatorial optimization.
△ Less
Submitted 5 October, 2016; v1 submitted 27 September, 2016;
originally announced September 2016.
-
On Information-Theoretic Characterizations of Markov Random Fields and Subfields
Authors:
Raymond W. Yeung,
Ali Al-Bashabsheh,
Chao Chen,
Qi Chen,
Pierre Moulin
Abstract:
Let $X_i, i \in V$ form a Markov random field (MRF) represented by an undirected graph $G = (V,E)$, and $V'$ be a subset of $V$.
We determine the smallest graph that can always represent the subfield $X_i, i \in V'$ as an MRF. Based on this result, we obtain a necessary and sufficient condition for a subfield of a Markov tree to be also a Markov tree. When $G$ is a path so that $X_i, i \in V$ fo…
▽ More
Let $X_i, i \in V$ form a Markov random field (MRF) represented by an undirected graph $G = (V,E)$, and $V'$ be a subset of $V$.
We determine the smallest graph that can always represent the subfield $X_i, i \in V'$ as an MRF. Based on this result, we obtain a necessary and sufficient condition for a subfield of a Markov tree to be also a Markov tree. When $G$ is a path so that $X_i, i \in V$ form a Markov chain, it is known that the $I$-Measure is always nonnegative and the information diagram assumes a very special structure Kawabata and Yeung (1992). We prove that Markov chain is essentially the only MRF such that the $I$-Measure is always nonnegative. By applying our characterization of the smallest graph representation of a subfield of an MRF, we develop a recursive approach for constructing information diagrams for MRFs. Our work is built on the set-theoretic characterization of an MRF in Yeung, Lee, and Ye (2002).
△ Less
Submitted 17 January, 2018; v1 submitted 12 August, 2016;
originally announced August 2016.
-
A Factor-Graph Approach to Algebraic Topology, With Applications to Kramers--Wannier Duality
Authors:
Ali Al-Bashabsheh,
Pascal O. Vontobel
Abstract:
Algebraic topology studies topological spaces with the help of tools from abstract algebra. The main focus of this paper is to show that many concepts from algebraic topology can be conveniently expressed in terms of (normal) factor graphs. As an application, we give an alternative proof of a classical duality result of Kramers and Wannier, which expresses the partition function of the two-dimensi…
▽ More
Algebraic topology studies topological spaces with the help of tools from abstract algebra. The main focus of this paper is to show that many concepts from algebraic topology can be conveniently expressed in terms of (normal) factor graphs. As an application, we give an alternative proof of a classical duality result of Kramers and Wannier, which expresses the partition function of the two-dimensional Ising model at a low temperature in terms of the partition function of the two-dimensional Ising model at a high temperature. Moreover, we discuss analogous results for the three-dimensional Ising model and the Potts model.
△ Less
Submitted 13 July, 2018; v1 submitted 4 July, 2016;
originally announced July 2016.
-
Incremental and Decremental Secret Key Agreement
Authors:
Chung Chan,
Ali Al-Bashabsheh,
Qiaoqiao Zhou
Abstract:
We study the rate of change of the multivariate mutual information among a set of random variables when some common randomness is added to or removed from a subset. This is formulated more precisely as two new multiterminal secret key agreement problems which ask how one can increase the secrecy capacity efficiently by adding common randomness to a small subset of users, and how one can simplify t…
▽ More
We study the rate of change of the multivariate mutual information among a set of random variables when some common randomness is added to or removed from a subset. This is formulated more precisely as two new multiterminal secret key agreement problems which ask how one can increase the secrecy capacity efficiently by adding common randomness to a small subset of users, and how one can simplify the source model by removing redundant common randomness that does not contribute to the secrecy capacity. The combinatorial structure has been clarified along with some meaningful open problems.
△ Less
Submitted 6 May, 2016;
originally announced May 2016.
-
Info-Clustering: A Mathematical Theory for Data Clustering
Authors:
Chung Chan,
Ali Al-Bashabsheh,
Qiaoqiao Zhou,
Tarik Kaced,
Tie Liu
Abstract:
We formulate an info-clustering paradigm based on a multivariate information measure, called multivariate mutual information, that naturally extends Shannon's mutual information between two random variables to the multivariate case involving more than two random variables. With proper model reductions, we show that the paradigm can be applied to study the human genome and connectome in a more mean…
▽ More
We formulate an info-clustering paradigm based on a multivariate information measure, called multivariate mutual information, that naturally extends Shannon's mutual information between two random variables to the multivariate case involving more than two random variables. With proper model reductions, we show that the paradigm can be applied to study the human genome and connectome in a more meaningful way than the conventional algorithmic approach. Not only can info-clustering provide justifications and refinements to some existing techniques, but it also inspires new computationally feasible solutions.
△ Less
Submitted 11 December, 2016; v1 submitted 4 May, 2016;
originally announced May 2016.
-
On Stochastic Estimation of Partition Function
Authors:
Ali Al-Bashabsheh,
Yongyi Mao
Abstract:
In this paper, we show analytically that the duality of normal factor graphs (NFG) can facilitate stochastic estimation of partition functions. In particular, our analysis suggests that for the $q-$ary two-dimensional nearest-neighbor Potts model, sampling from the primal NFG of the model and sampling from its dual exhibit opposite behaviours with respect to the temperature of the model. For high-…
▽ More
In this paper, we show analytically that the duality of normal factor graphs (NFG) can facilitate stochastic estimation of partition functions. In particular, our analysis suggests that for the $q-$ary two-dimensional nearest-neighbor Potts model, sampling from the primal NFG of the model and sampling from its dual exhibit opposite behaviours with respect to the temperature of the model. For high-temperature models, sampling from the primal NFG gives rise to better estimators whereas for low-temperature models, sampling from the dual gives rise to better estimators. This analysis is validated by experiments.
△ Less
Submitted 28 January, 2014;
originally announced January 2014.
-
Normal Factor Graphs as Probabilistic Models
Authors:
Ali Al-Bashabsheh,
Yongyi Mao
Abstract:
We present a new probabilistic modelling framework based on the recent notion of normal factor graph (NFG). We show that the proposed NFG models and their transformations unify some existing models such as factor graphs, convolutional factor graphs, and cumulative distribution networks. The two subclasses of the NFG models, namely the constrained and generative models, exhibit a duality in their d…
▽ More
We present a new probabilistic modelling framework based on the recent notion of normal factor graph (NFG). We show that the proposed NFG models and their transformations unify some existing models such as factor graphs, convolutional factor graphs, and cumulative distribution networks. The two subclasses of the NFG models, namely the constrained and generative models, exhibit a duality in their dependence structure. Transformation of NFG models further extends the power of this modelling framework. We point out the well-known NFG representations of parity and generator realizations of a linear code as generative and constrained models, and comment on a more prevailing duality in this context. Finally, we address the algorithmic aspect of computing the exterior function of NFGs and the inference problem on NFGs.
△ Less
Submitted 14 September, 2012;
originally announced September 2012.
-
Normal Factor Graphs: A Diagrammatic Approach to Linear Algebra
Authors:
Ali Al-Bashabsheh,
Yongyi Mao,
Pascal O. Vontobel
Abstract:
Inspired by some new advances on normal factor graphs (NFGs), we introduce NFGs as a simple and intuitive diagrammatic approach towards encoding some concepts from linear algebra. We illustrate with examples the workings of such an approach and settle a conjecture of Peterson on the Pfaffian.
Inspired by some new advances on normal factor graphs (NFGs), we introduce NFGs as a simple and intuitive diagrammatic approach towards encoding some concepts from linear algebra. We illustrate with examples the workings of such an approach and settle a conjecture of Peterson on the Pfaffian.
△ Less
Submitted 1 June, 2011; v1 submitted 28 February, 2011;
originally announced February 2011.
-
On Holant Theorem and Its Proof
Authors:
Ali Al-Bashabsheh,
Yongyi Mao,
Abbas Yongacoglu
Abstract:
Holographic algorithms are a recent breakthrough in computer science and has found applications in information theory. This paper provides a proof to the central component of holographic algorithms, namely, the Holant theorem. Compared with previous works, the proof appears simpler and more direct. Along the proof, we also develop a mathematical tool, which we call c-tensor. We expect the notion…
▽ More
Holographic algorithms are a recent breakthrough in computer science and has found applications in information theory. This paper provides a proof to the central component of holographic algorithms, namely, the Holant theorem. Compared with previous works, the proof appears simpler and more direct. Along the proof, we also develop a mathematical tool, which we call c-tensor. We expect the notion of c-tensor may be applicable over a wide range of analysis.
△ Less
Submitted 8 May, 2010;
originally announced May 2010.
-
Normal Factor Graphs and Holographic Transformations
Authors:
Ali Al-Bashabsheh,
Yongyi Mao
Abstract:
This paper stands at the intersection of two distinct lines of research. One line is "holographic algorithms," a powerful approach introduced by Valiant for solving various counting problems in computer science; the other is "normal factor graphs," an elegant framework proposed by Forney for representing codes defined on graphs. We introduce the notion of holographic transformations for normal fac…
▽ More
This paper stands at the intersection of two distinct lines of research. One line is "holographic algorithms," a powerful approach introduced by Valiant for solving various counting problems in computer science; the other is "normal factor graphs," an elegant framework proposed by Forney for representing codes defined on graphs. We introduce the notion of holographic transformations for normal factor graphs, and establish a very general theorem, called the generalized Holant theorem, which relates a normal factor graph to its holographic transformation. We show that the generalized Holant theorem on the one hand underlies the principle of holographic algorithms, and on the other hand reduces to a general duality theorem for normal factor graphs, a special case of which was first proved by Forney. In the course of our development, we formalize a new semantics for normal factor graphs, which highlights various linear algebraic properties that potentially enable the use of normal factor graphs as a linear algebraic tool.
△ Less
Submitted 3 February, 2011; v1 submitted 21 April, 2010;
originally announced April 2010.
-
On the k-pairs problem
Authors:
Ali Al-Bashabsheh,
Abbas Yongacoglu
Abstract:
We consider network coding rates for directed and undirected $k$-pairs networks. For directed networks, meagerness is known to be an upper bound on network coding rates. We show that network coding rate can be $Θ(|V|)$ multiplicative factor smaller than meagerness. For the undirected case, we show some progress in the direction of the $k$-pairs conjecture.
We consider network coding rates for directed and undirected $k$-pairs networks. For directed networks, meagerness is known to be an upper bound on network coding rates. We show that network coding rate can be $Θ(|V|)$ multiplicative factor smaller than meagerness. For the undirected case, we show some progress in the direction of the $k$-pairs conjecture.
△ Less
Submitted 30 April, 2008;
originally announced May 2008.
-
On the Capacity Bounds of Undirected Networks
Authors:
Ali Al-Bashabsheh,
Abbas Yongacoglu
Abstract:
In this work we improve on the bounds presented by Li&Li for network coding gain in the undirected case. A tightened bound for the undirected multicast problem with three terminals is derived. An interesting result shows that with fractional routing, routing throughput can achieve at least 75% of the coding throughput. A tighter bound for the general multicast problem with any number of terminal…
▽ More
In this work we improve on the bounds presented by Li&Li for network coding gain in the undirected case. A tightened bound for the undirected multicast problem with three terminals is derived. An interesting result shows that with fractional routing, routing throughput can achieve at least 75% of the coding throughput. A tighter bound for the general multicast problem with any number of terminals shows that coding gain is strictly less than 2. Our derived bound depends on the number of terminals in the multicast network and approaches 2 for arbitrarily large number of terminals.
△ Less
Submitted 28 April, 2008;
originally announced April 2008.