Search | arXiv e-print repository

A Concise Mathematical Description of Active Inference in Discrete Time

Authors: Jesse van Oostrum, Carlotta Langer, Nihat Ay

Abstract: In this paper we present a concise mathematical description of active inference in discrete time. The main part of the paper serves as a general introduction to the topic, including an example illustrating the theory on action selection. In the appendix the more subtle mathematical details are discussed. This part is aimed at readers who have already studied the active inference literature but str… ▽ More In this paper we present a concise mathematical description of active inference in discrete time. The main part of the paper serves as a general introduction to the topic, including an example illustrating the theory on action selection. In the appendix the more subtle mathematical details are discussed. This part is aimed at readers who have already studied the active inference literature but struggle to make sense of the mathematical details and derivations. Throughout the whole manuscript, special attention has been paid to adopting notation that is both precise and in line with standard mathematical texts. All equations and derivations are linked to specific equation numbers in other popular text on the topic. Furthermore, Python code is provided that implements the action selection mechanism described in this paper and is compatible with pymdp environments. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2307.11249 [pdf, other]

On the Fisher-Rao Gradient of the Evidence Lower Bound

Authors: Nihat Ay, Jesse van Oostrum

Abstract: This article studies the Fisher-Rao gradient, also referred to as the natural gradient, of the evidence lower bound, the ELBO, which plays a crucial role within the theory of the Variational Autonecoder, the Helmholtz Machine and the Free Energy Principle. The natural gradient of the ELBO is related to the natural gradient of the Kullback-Leibler divergence from a target distribution, the prime ob… ▽ More This article studies the Fisher-Rao gradient, also referred to as the natural gradient, of the evidence lower bound, the ELBO, which plays a crucial role within the theory of the Variational Autonecoder, the Helmholtz Machine and the Free Energy Principle. The natural gradient of the ELBO is related to the natural gradient of the Kullback-Leibler divergence from a target distribution, the prime objective function of learning. Based on invariance properties of gradients within information geometry, conditions on the underlying model are provided that ensure the equivalence of minimising the prime objective function and the maximisation of the ELBO. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2212.10649 [pdf, ps, other]

doi 10.1016/j.ijar.2023.109042

Inversion of Bayesian Networks

Authors: Jesse van Oostrum, Peter van Hintum, Nihat Ay

Abstract: Variational autoencoders and Helmholtz machines use a recognition network (encoder) to approximate the posterior distribution of a generative model (decoder). In this paper we study the necessary and sufficient properties of a recognition network so that it can model the true posterior distribution exactly. These results are derived in the general context of probabilistic graphical modelling / Bay… ▽ More Variational autoencoders and Helmholtz machines use a recognition network (encoder) to approximate the posterior distribution of a generative model (decoder). In this paper we study the necessary and sufficient properties of a recognition network so that it can model the true posterior distribution exactly. These results are derived in the general context of probabilistic graphical modelling / Bayesian networks, for which the network represents a set of conditional independence statements. We derive both global conditions, in terms of d-separation, and local conditions for the recognition network to have the desired qualities. It turns out that for the local conditions the property perfectness (for every node, all parents are joined) plays an important role. △ Less

Submitted 2 November, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Journal ref: International Journal of Approximate Reasoning, Volume 164, 2024

arXiv:2209.01418 [pdf, other]

Outsourcing Control requires Control Complexity

Authors: Carlotta Langer, Nihat Ay

Abstract: An embodied agent constantly influences its environment and is influenced by it. We use the sensorimotor loop to model these interactions and thereby we can quantify different information flows in the system by various information theoretic measures. This includes a measure for the interaction among the agent's body and its environment, called Morphological Computation. Additionally, we examine th… ▽ More An embodied agent constantly influences its environment and is influenced by it. We use the sensorimotor loop to model these interactions and thereby we can quantify different information flows in the system by various information theoretic measures. This includes a measure for the interaction among the agent's body and its environment, called Morphological Computation. Additionally, we examine the controller complexity by two measures, one of which can be seen in the context of the Integrated Information Theory of consciousness. Applying this framework to an experimental setting with simulated agents allows us to analyze the interaction between an agent and its environment, as well as the complexity of its controller, the brain of the agent. Previous research reveals an antagonistic relationship between the controller complexity and Morphological Computation. A morphology adapted well to a task can reduce the necessary complexity of the controller significantly. This creates the problem that embodied intelligence is correlated with a reduced necessity of a controller, a brain. However, in order to interact well with their surroundings, the agents first have to understand the relevant dynamics of the environment. By analyzing learning agents we observe that an increased controller complexity can facilitate a better interaction between an agent's body and its environment. Hence, learning requires an increased controller complexity and the controller complexity and Morphological Computation influence each other. △ Less

Submitted 1 June, 2023; v1 submitted 3 September, 2022; originally announced September 2022.

arXiv:2206.15273 [pdf, ps, other]

doi 10.1007/s41884-022-00067-9

Invariance Properties of the Natural Gradient in Overparametrised Systems

Authors: Jesse van Oostrum, Johannes Müller, Nihat Ay

Abstract: The natural gradient field is a vector field that lives on a model equipped with a distinguished Riemannian metric, e.g. the Fisher-Rao metric, and represents the direction of steepest ascent of an objective function on the model with respect to this metric. In practice, one tries to obtain the corresponding direction on the parameter space by multiplying the ordinary gradient by the inverse of th… ▽ More The natural gradient field is a vector field that lives on a model equipped with a distinguished Riemannian metric, e.g. the Fisher-Rao metric, and represents the direction of steepest ascent of an objective function on the model with respect to this metric. In practice, one tries to obtain the corresponding direction on the parameter space by multiplying the ordinary gradient by the inverse of the Gram matrix associated with the metric. We refer to this vector on the parameter space as the natural parameter gradient. In this paper we study when the pushforward of the natural parameter gradient is equal to the natural gradient. Furthermore we investigate the invariance properties of the natural parameter gradient. Both questions are addressed in an overparametrised setting. △ Less

Submitted 30 June, 2022; originally announced June 2022.

Journal ref: Information Geometry, Springer, 2022

arXiv:2108.00904 [pdf, other]

doi 10.3389/fpsyg.2021.716433

How Morphological Computation shapes Integrated Information in Embodied Agents

Authors: Carlotta Langer, Nihat Ay

Abstract: The Integrated Information Theory provides a quantitative approach to consciousness and can be applied to neural networks. An embodied agent controlled by such a network influences and is being influenced by its environment. This involves, on the one hand, morphological computation within goal directed action and, on the other hand, integrated information within the controller, the agent's brain.… ▽ More The Integrated Information Theory provides a quantitative approach to consciousness and can be applied to neural networks. An embodied agent controlled by such a network influences and is being influenced by its environment. This involves, on the one hand, morphological computation within goal directed action and, on the other hand, integrated information within the controller, the agent's brain. In this article, we combine different methods in order to examine the information flows among and within the body, the brain and the environment of an agent. This allows us to relate various information flows to each other. We test this framework in a simple experimental setup. There, we calculate the optimal policy for goal-directed behavior based on the "planning as inference" method, in which the information-geometric em-algorithm is used to optimize the likelihood of the goal. Morphological computation and integrated information are then calculated with respect to the optimal policies. Comparing the dynamics of these measures under changing morphological circumstances highlights the antagonistic relationship between these two concepts. The more morphological computation is involved, the less information integration within the brain is required. In order to determine the influence of the brain on the behavior of the agent it is necessary to additionally measure the information flow to and from the brain. △ Less

Submitted 29 November, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

Journal ref: Front. Psychol. 12:716433 (2021)

arXiv:2010.09508 [pdf, ps, other]

doi 10.1007/s11071-021-06904-3

Approaching a large deviation theory for complex systems

Authors: Ugur Tirnakli, Constantino Tsallis, Nihat Ay

Abstract: The standard Large Deviation Theory (LDT) is mathematically illustrated by the Boltzmann-Gibbs factor which describes the thermal equilibrium of short-range-interacting many-body Hamiltonian systems, the velocity distribution of which is Maxwellian. It is generically applicable to systems satisfying the Central Limit Theorem (CLT). When we focus instead on stationary states of typical complex syst… ▽ More The standard Large Deviation Theory (LDT) is mathematically illustrated by the Boltzmann-Gibbs factor which describes the thermal equilibrium of short-range-interacting many-body Hamiltonian systems, the velocity distribution of which is Maxwellian. It is generically applicable to systems satisfying the Central Limit Theorem (CLT). When we focus instead on stationary states of typical complex systems (e.g., classical long-range-interacting many-body Hamiltonian systems, such as self-gravitating ones), the CLT, and possibly also the LDT, need to be generalised. Specifically, when the $N\to\infty$ attractor ($N$ being the number of degrees of freedom) in the space of distributions is a $Q$-Gaussian (a nonadditive $q$-entropy-based generalisation of the standard Gaussian case, which is recovered for $Q=1$) related to a $Q$-generalised CLT, we expect the LDT probability distribution to asymptotically approach a power law. Consistently with available strong numerical indications for probabilistic models, this behaviour possibly is that associated to a $q$-exponential (defined as $e_q^x\equiv\left[1+(1-q)x\right]^{1/(1-q)}$, which is the generalisation of the standard exponential form, straightforwardly recovered for $q=1$); $q$ and $Q$ are expected to be simply connected, including the particular case $q=Q=1$. The argument of such $q$-exponential would be expected to be proportional to $N$, analogously to the thermodynamical entropy of many-body Hamiltonian systems. We provide here numerical evidence supporting the asymptotic power-law by analysing the standard map, the coherent noise model for biological extinctions and earthquakes, the Ehrenfest dog-flea model, and the random-walk avalanches. △ Less

Submitted 23 December, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

Comments: 10 pages, 3 figs

Journal ref: Nonlinear Dyn. 106 (2021) 2537

arXiv:2008.11430 [pdf, other]

doi 10.3390/e22101107

Complexity as Causal Information Integration

Authors: Carlotta Langer, Nihat Ay

Abstract: Complexity measures in the context of the Integrated Information Theory of consciousness try to quantify the strength of the causal connections between different neurons. This is done by minimizing the KL-divergence between a full system and one without causal connections. Various measures have been proposed and compared in this setting. We will discuss a class of information geometric measures th… ▽ More Complexity measures in the context of the Integrated Information Theory of consciousness try to quantify the strength of the causal connections between different neurons. This is done by minimizing the KL-divergence between a full system and one without causal connections. Various measures have been proposed and compared in this setting. We will discuss a class of information geometric measures that aim at assessing the intrinsic causal influences in a system. One promising candidate of these measures, denoted by $Φ_{CIS}$, is based on conditional independence statements and does satisfy all of the properties that have been postulated as desirable. Unfortunately it does not have a graphical representation which makes it less intuitive and difficult to analyze. We propose an alternative approach using a latent variable which models a common exterior influence. This leads to a measure $Φ_{CII}$, Causal Information Integration, that satisfies all of the required conditions. Our measure can be calculated using an iterative information geometric algorithm, the em-algorithm. Therefore we are able to compare its behavior to existing integrated information measures. △ Less

Submitted 8 February, 2021; v1 submitted 26 August, 2020; originally announced August 2020.

Journal ref: Langer C, Ay N. Complexity as Causal Information Integration. Entropy. 2020; 22(10):1107

arXiv:2008.06687 [pdf, other]

Natural Reweighted Wake-Sleep

Authors: Csongor Várady, Riccardo Volpi, Luigi Malagò, Nihat Ay

Abstract: Helmholtz Machines (HMs) are a class of generative models composed of two Sigmoid Belief Networks (SBNs), acting respectively as an encoder and a decoder. These models are commonly trained using a two-step optimization algorithm called Wake-Sleep (WS) and more recently by improved versions, such as Reweighted Wake-Sleep (RWS) and Bidirectional Helmholtz Machines (BiHM). The locality of the connect… ▽ More Helmholtz Machines (HMs) are a class of generative models composed of two Sigmoid Belief Networks (SBNs), acting respectively as an encoder and a decoder. These models are commonly trained using a two-step optimization algorithm called Wake-Sleep (WS) and more recently by improved versions, such as Reweighted Wake-Sleep (RWS) and Bidirectional Helmholtz Machines (BiHM). The locality of the connections in an SBN induces sparsity in the Fisher Information Matrices associated to the probabilistic models, in the form of a finely-grained block-diagonal structure. In this paper we exploit this property to efficiently train SBNs and HMs using the natural gradient. We present a novel algorithm, called Natural Reweighted Wake-Sleep (NRWS), that corresponds to the geometric adaptation of its standard version. In a similar manner, we also introduce Natural Bidirectional Helmholtz Machine (NBiHM). Differently from previous work, we will show how for HMs the natural gradient can be efficiently computed without the need of introducing any approximation in the structure of the Fisher information matrix. The experiments performed on standard datasets from the literature show a consistent improvement of NRWS and NBiHM not only with respect to their non-geometric baselines but also with respect to state-of-the-art training algorithms for HMs. The improvement is quantified both in terms of speed of convergence as well as value of the log-likelihood reached after training. △ Less

Submitted 14 September, 2022; v1 submitted 15 August, 2020; originally announced August 2020.

Comments: 41 pages, 18 figures, to be published in Neural Networks Journal

MSC Class: 68T07

arXiv:2007.03129 [pdf, other]

Confounding Ghost Channels and Causality: A New Approach to Causal Information Flows

Authors: Nihat Ay

Abstract: Information theory provides a fundamental framework for the quantification of information flows through channels, formally Markov kernels. However, quantities such as mutual information and conditional mutual information do not necessarily reflect the causal nature of such flows. We argue that this is often the result of conditioning based on sigma algebras that are not associated with the given c… ▽ More Information theory provides a fundamental framework for the quantification of information flows through channels, formally Markov kernels. However, quantities such as mutual information and conditional mutual information do not necessarily reflect the causal nature of such flows. We argue that this is often the result of conditioning based on sigma algebras that are not associated with the given channels. We propose a version of the (conditional) mutual information based on families of sigma algebras that are coupled with the underlying channel. This leads to filtrations which allow us to prove a corresponding causal chain rule as a basic requirement within the presented approach. △ Less

Submitted 6 July, 2020; originally announced July 2020.

arXiv:2005.11510 [pdf, other]

The Information-Geometric Perspective of Compositional Data Analysis

Authors: Ionas Erb, Nihat Ay

Abstract: Information geometry uses the formal tools of differential geometry to describe the space of probability distributions as a Riemannian manifold with an additional dual structure. The formal equivalence of compositional data with discrete probability distributions makes it possible to apply the same description to the sample space of Compositional Data Analysis (CoDA). The latter has been formally… ▽ More Information geometry uses the formal tools of differential geometry to describe the space of probability distributions as a Riemannian manifold with an additional dual structure. The formal equivalence of compositional data with discrete probability distributions makes it possible to apply the same description to the sample space of Compositional Data Analysis (CoDA). The latter has been formally described as a Euclidean space with an orthonormal basis featuring components that are suitable combinations of the original parts. In contrast to the Euclidean metric, the information-geometric description singles out the Fisher information metric as the only one kee** the manifold's geometric structure invariant under equivalent representations of the underlying random variables. Well-known concepts that are valid in Euclidean coordinates, e.g., the Pythogorean theorem, are generalized by information geometry to corresponding notions that hold for more general coordinates. In briefly reviewing Euclidean CoDA and, in more detail, the information-geometric approach, we show how the latter justifies the use of distance measures and divergences that so far have received little attention in CoDA as they do not fit the Euclidean geometry favored by current thinking. We also show how entropy and relative entropy can describe amalgamations in a simple way, while Aitchison distance requires the use of geometric means to obtain more succinct relationships. We proceed to prove the information monotonicity property for Aitchison distance. We close with some thoughts about new directions in CoDA where the rich structure that is provided by information geometry could be exploited. △ Less

Submitted 27 April, 2021; v1 submitted 23 May, 2020; originally announced May 2020.

Comments: 22 pages, 3 figures

MSC Class: 94A17; 62-07; 53Z99

arXiv:2005.10791 [pdf, other]

On the Locality of the Natural Gradient for Deep Learning

Authors: Nihat Ay

Abstract: We study the natural gradient method for learning in deep Bayesian networks, including neural networks. There are two natural geometries associated with such learning systems consisting of visible and hidden units. One geometry is related to the full system, the other one to the visible sub-system. These two geometries imply different natural gradients. In a first step, we demonstrate a great simp… ▽ More We study the natural gradient method for learning in deep Bayesian networks, including neural networks. There are two natural geometries associated with such learning systems consisting of visible and hidden units. One geometry is related to the full system, the other one to the visible sub-system. These two geometries imply different natural gradients. In a first step, we demonstrate a great simplification of the natural gradient with respect to the first geometry, due to locality properties of the Fisher information matrix. This simplification does not directly translate to a corresponding simplification with respect to the second geometry. We develop the theory for studying the relation between the two versions of the natural gradient and outline a method for the simplification of the natural gradient with respect to the second geometry based on the first one. This method suggests to incorporate a recognition model as an auxiliary model for the efficient application of the natural gradient method in deep networks. △ Less

Submitted 21 May, 2020; originally announced May 2020.

arXiv:1910.05979 [pdf, other]

Information Decomposition based on Cooperative Game Theory

Authors: Nihat Ay, Daniel Polani, Nathaniel Virgo

Abstract: We offer a new approach to the information decomposition problem in information theory: given a 'target' random variable co-distributed with multiple 'source' variables, how can we decompose the mutual information into a sum of non-negative terms that quantify the contributions of each random variable, not only individually but also in combination? We derive our composition from cooperative game t… ▽ More We offer a new approach to the information decomposition problem in information theory: given a 'target' random variable co-distributed with multiple 'source' variables, how can we decompose the mutual information into a sum of non-negative terms that quantify the contributions of each random variable, not only individually but also in combination? We derive our composition from cooperative game theory. It can be seen as assigning a "fair share" of the mutual information to each combination of the source variables. Our decomposition is based on a different lattice from the usual 'partial information decomposition' (PID) approach, and as a consequence our decomposition has a smaller number of terms: it has analogs of the synergy and unique information terms, but lacks terms corresponding to redundancy. Because of this, it is able to obey equivalents of the axioms known as 'local positivity' and 'identity', which cannot be simultaneously satisfied by a PID measure. △ Less

Submitted 14 October, 2019; originally announced October 2019.

Comments: under review by Kybernetika journal

arXiv:1907.11122 [pdf, other]

doi 10.3390/e21090831

Canonical divergence for flat $α$-connections: Classical and Quantum

Authors: Domenico Felice, Nihat Ay

Abstract: A recent canonical divergence, which is introduced on a smooth manifold $\mathrm{M}$ endowed with a general dualistic structure $(\mathrm{g},\nabla,\nabla^*)$, is considered for flat $α$-connections. In the classical setting, we compute such a canonical divergence on the manifold of positive measures and prove that it coincides with the classical $α$-divergence. In the quantum framework, the recen… ▽ More A recent canonical divergence, which is introduced on a smooth manifold $\mathrm{M}$ endowed with a general dualistic structure $(\mathrm{g},\nabla,\nabla^*)$, is considered for flat $α$-connections. In the classical setting, we compute such a canonical divergence on the manifold of positive measures and prove that it coincides with the classical $α$-divergence. In the quantum framework, the recent canonical divergence is evaluated for the quantum $α$-connections on the manifold of all positive definite Hermitian operators. Also in this case we obtain that the recent canonical divergence is the quantum $α$-divergence. △ Less

Submitted 23 August, 2019; v1 submitted 25 July, 2019; originally announced July 2019.

Comments: 18 pages

arXiv:1903.09797 [pdf, other]

doi 10.3390/e21040435

Canonical divergence for measuring classical and quantum complexity

Authors: Domenico Felice, Stefano Mancini, Nihat Ay

Abstract: A new canonical divergence is put forward for generalizing an information-geometric measure of complexity for both, classical and quantum systems. On the simplex of probability measures it is proved that the new divergence coincides with the Kullback-Leibler divergence, which is used to quantify how much a probability measure deviates from the non-interacting states that are modeled by exponential… ▽ More A new canonical divergence is put forward for generalizing an information-geometric measure of complexity for both, classical and quantum systems. On the simplex of probability measures it is proved that the new divergence coincides with the Kullback-Leibler divergence, which is used to quantify how much a probability measure deviates from the non-interacting states that are modeled by exponential families of probabilities. On the space of positive density operators, we prove that the same divergence reduces to the quantum relative entropy, which quantifies many-party correlations of a quantum state from a Gibbs family. △ Less

Submitted 26 April, 2019; v1 submitted 23 March, 2019; originally announced March 2019.

Comments: 17 pages

Journal ref: Entropy 2019, 21(4), 435

arXiv:1903.02379 [pdf, other]

Divergence functions in Information Geometry

Authors: Domenico Felice, Nihat Ay

Abstract: A recently introduced canonical divergence $\mathcal{D}$ for a dual structure $(\mathrm{g},\nabla,\nabla^*)$ is discussed in connection to other divergence functions. Finally, open problems concerning symmetry properties are outlined. A recently introduced canonical divergence $\mathcal{D}$ for a dual structure $(\mathrm{g},\nabla,\nabla^*)$ is discussed in connection to other divergence functions. Finally, open problems concerning symmetry properties are outlined. △ Less

Submitted 6 March, 2019; originally announced March 2019.

Comments: 10 pages

arXiv:1812.04461 [pdf, other]

Dynamical Systems induced by Canonical Divergence in dually flat manifolds

Authors: Domenico Felice, Nihat Ay

Abstract: The principles of classical mechanics have shown that the inertial quality of mass is characterized by the kinetic energy. This, in turn, establishes the connection between geometry and mechanics. We aim to exploit such a fundamental principle for information geometry entering the realm of mechanics. According to the modification of curve energy stated by Amari and Nagaoka for a smooth manifold… ▽ More The principles of classical mechanics have shown that the inertial quality of mass is characterized by the kinetic energy. This, in turn, establishes the connection between geometry and mechanics. We aim to exploit such a fundamental principle for information geometry entering the realm of mechanics. According to the modification of curve energy stated by Amari and Nagaoka for a smooth manifold $\mathrm{M}$ endowed with a dual structure $(\mathrm{g},\nabla,\nabla^*)$, we consider $\nabla$ and $\nabla^*$ kinetic energies. Then, we prove that a recently introduced canonical divergence and its dual function coincide with Hamilton principal functions associated with suitable Lagrangian functions when $(\mathrm{M},\mathrm{g},\nabla,\nabla^*)$ is dually flat. Corresponding dynamical systems are studied and the tangent dynamics is outlined in terms of the Riemannian gradient of the canonical divergence. Solutions of such dynamics are proved to be $\nabla$ and $\nabla^*$ geodesics connecting any two points sufficiently close to each other. Application to the standard Gaussian model is also investigated. △ Less

Submitted 11 December, 2018; originally announced December 2018.

Comments: 31 pages

arXiv:1806.11363 [pdf, other]

doi 10.1007/s41884-021-00047-5

Towards a Canonical Divergence within Information Geometry

Authors: Domenico Felice, Nihat Ay

Abstract: In Riemannian geometry geodesics are integral curves of the Riemannian distance gradient. We extend this classical result to the framework of Information Geometry. In particular, we prove that the rays of level-sets defined by a pseudo-distance are generated by the sum of two tangent vectors. By relying on these vectors, we propose a novel definition of a canonical divergence and its dual function… ▽ More In Riemannian geometry geodesics are integral curves of the Riemannian distance gradient. We extend this classical result to the framework of Information Geometry. In particular, we prove that the rays of level-sets defined by a pseudo-distance are generated by the sum of two tangent vectors. By relying on these vectors, we propose a novel definition of a canonical divergence and its dual function. We prove that the new divergence allows to recover a given dual structure $(\mathrm{g},\nabla,\nabla^*)$ of {a dually convex set on} a smooth manifold $\mathrm{M}$. Additionally, we show that this divergence coincides with the canonical divergence proposed by Ay and Amari in the case of: (a) self-duality, (b) dual flatness, (c) statistical geometric analogue of the concept of symmetric spaces in Riemannian geometry. For a dually convex set, the case (c) leads to a further comparison of the new divergence with the one introduced by Henmi and Kobayashi. △ Less

Submitted 29 June, 2021; v1 submitted 29 June, 2018; originally announced June 2018.

Comments: 71 Pages, 4 figures

Journal ref: Information Geometry (2021)

arXiv:1706.09667 [pdf, other]

doi 10.3390/e19070310

Comparing Information-Theoretic Measures of Complexity in Boltzmann Machines

Authors: Maxinder S. Kanwal, Joshua A. Grochow, Nihat Ay

Abstract: In the past three decades, many theoretical measures of complexity have been proposed to help understand complex systems. In this work, for the first time, we place these measures on a level playing field, to explore the qualitative similarities and differences between them, and their shortcomings. Specifically, using the Boltzmann machine architecture (a fully connected recurrent neural network)… ▽ More In the past three decades, many theoretical measures of complexity have been proposed to help understand complex systems. In this work, for the first time, we place these measures on a level playing field, to explore the qualitative similarities and differences between them, and their shortcomings. Specifically, using the Boltzmann machine architecture (a fully connected recurrent neural network) with uniformly distributed weights as our model of study, we numerically measure how complexity changes as a function of network dynamics and network parameters. We apply an extension of one such information-theoretic measure of complexity to understand incremental Hebbian learning in Hopfield networks, a fully recurrent architecture model of autoassociative memory. In the course of Hebbian learning, the total information flow reflects a natural upward trend in complexity as the network attempts to learn more and more patterns. △ Less

Submitted 29 July, 2017; v1 submitted 29 June, 2017; originally announced June 2017.

Comments: 16 pages, 7 figures; Appears in Entropy, Special Issue "Information Geometry II"

Journal ref: Entropy (2017), 19(7), 310

arXiv:1705.11014 [pdf, ps, other]

Congruent families and invariant tensors

Authors: Lorenz Schwachhöfer, Nihat Ay, Jürgen Jost, Hông Vân Lê

Abstract: Classical results of Chentsov and Campbell state that -- up to constant multiples -- the only $2$-tensor field of a statistical model which is invariant under congruent Markov morphisms is the Fisher metric and the only invariant $3$-tensor field is the Amari-Chentsov tensor. We generalize this result for arbitrary degree $n$, showing that any family of $n$-tensors which is invariant under congrue… ▽ More Classical results of Chentsov and Campbell state that -- up to constant multiples -- the only $2$-tensor field of a statistical model which is invariant under congruent Markov morphisms is the Fisher metric and the only invariant $3$-tensor field is the Amari-Chentsov tensor. We generalize this result for arbitrary degree $n$, showing that any family of $n$-tensors which is invariant under congruent Markov morphisms is algebraically generated by the canonical tensor fields defined in an earlier paper. △ Less

Submitted 31 May, 2017; originally announced May 2017.

arXiv:1605.09735 [pdf, other]

Information Theoretically Aided Reinforcement Learning for Embodied Agents

Authors: Guido Montufar, Keyan Ghazi-Zahedi, Nihat Ay

Abstract: Reinforcement learning for embodied agents is a challenging problem. The accumulated reward to be optimized is often a very rugged function, and gradient methods are impaired by many local optimizers. We demonstrate, in an experimental setting, that incorporating an intrinsic reward can smoothen the optimization landscape while preserving the global optimizers of interest. We show that policy grad… ▽ More Reinforcement learning for embodied agents is a challenging problem. The accumulated reward to be optimized is often a very rugged function, and gradient methods are impaired by many local optimizers. We demonstrate, in an experimental setting, that incorporating an intrinsic reward can smoothen the optimization landscape while preserving the global optimizers of interest. We show that policy gradient optimization for locomotion in a complex morphology is significantly improved when supplementing the extrinsic reward by an intrinsic reward defined in terms of the mutual information of time consecutive sensor readings. △ Less

Submitted 31 May, 2016; originally announced May 2016.

Comments: 10 pages, 4 figures, 8 pages appendix

MSC Class: 68T05; 68T40

arXiv:1603.08389 [pdf, other]

doi 10.1007/s12064-015-0217-3

The Umwelt of an Embodied Agent -- A Measure-Theoretic Definition

Authors: Nihat Ay, Wolfgang Löhr

Abstract: We consider a general model of the sensorimotor loop of an agent interacting with the world. This formalises Uexküll's notion of a \emph{function-circle}. Here, we assume a particular causal structure, mechanistically described in terms of Markov kernels. In this generality, we define two $σ$-algebras of events in the world that describe two respective perspectives: (1) the perspective of an exter… ▽ More We consider a general model of the sensorimotor loop of an agent interacting with the world. This formalises Uexküll's notion of a \emph{function-circle}. Here, we assume a particular causal structure, mechanistically described in terms of Markov kernels. In this generality, we define two $σ$-algebras of events in the world that describe two respective perspectives: (1) the perspective of an external observer, (2) the intrinsic perspective of the agent. Not all aspects of the world, seen from the external perspective, are accessible to the agent. This is expressed by the fact that the second $σ$-algebra is a subalgebra of the first one. We propose the smaller one as formalisation of Uexküll's \emph{Umwelt} concept. We show that, under continuity and compactness assumptions, the global dynamics of the world can be simplified without changing the internal process. This simplification can serve as a minimal world model that the system must have in order to be consistent with the internal process. △ Less

Submitted 28 March, 2016; originally announced March 2016.

Comments: 16 pages

Journal ref: Theory in Biosciences, 134 no. 3, pp. 105-116, 2015

arXiv:1603.07181 [pdf, other]

Iterative Scaling Algorithm for Channels

Authors: Paolo Perrone, Nihat Ay

Abstract: Here we define a procedure for evaluating KL-projections (I- and rI-projections) of channels. These can be useful in the decomposition of mutual information between input and outputs, e.g. to quantify synergies and interactions of different orders, as well as information integration and other related measures of complexity. The algorithm is a generalization of the standard iterative scaling algo… ▽ More Here we define a procedure for evaluating KL-projections (I- and rI-projections) of channels. These can be useful in the decomposition of mutual information between input and outputs, e.g. to quantify synergies and interactions of different orders, as well as information integration and other related measures of complexity. The algorithm is a generalization of the standard iterative scaling algorithm, which we here extend from probability distributions to channels (also known as transition kernels). △ Less

Submitted 19 September, 2016; v1 submitted 23 March, 2016; originally announced March 2016.

arXiv:1512.03614 [pdf, other]

Hierarchical Quantification of Synergy in Channels

Authors: Paolo Perrone, Nihat Ay

Abstract: The decomposition of channel information into synergies of different order is an open, active problem in the theory of complex systems. Most approaches to the problem are based on information theory, and propose decompositions of mutual information between inputs and outputs in se\-veral ways, none of which is generally accepted yet. We propose a new point of view on the topic. We model a multi-… ▽ More The decomposition of channel information into synergies of different order is an open, active problem in the theory of complex systems. Most approaches to the problem are based on information theory, and propose decompositions of mutual information between inputs and outputs in se\-veral ways, none of which is generally accepted yet. We propose a new point of view on the topic. We model a multi-input channel as a Markov kernel. We can project the channel onto a series of exponential families which form a hierarchical structure. This is carried out with tools from information geometry, in a way analogous to the projections of probability distributions introduced by Amari. A Pythagorean relation leads naturally to a decomposition of the mutual information between inputs and outputs into terms which represent single node information, pairwise interactions, and in general n-node interactions. The synergy measures introduced in this paper can be easily evaluated by an iterative scaling algorithm, which is a standard procedure in information geometry. △ Less

Submitted 11 December, 2015; originally announced December 2015.

Comments: 20 pages, 12 figures, Front. Robot. AI - Computational Intelligence 2015

arXiv:1512.00250 [pdf, other]

doi 10.3389/frobt.2016.00042

Evaluating Morphological Computation in Muscle and DC-motor Driven Models of Human Hop**

Authors: Keyan Ghazi-Zahedi, Daniel F. B. Haeufle, Guido Montufar, Syn Schmitt, Nihat Ay

Abstract: In the context of embodied artificial intelligence, morphological computation refers to processes which are conducted by the body (and environment) that otherwise would have to be performed by the brain. Exploiting environmental and morphological properties is an important feature of embodied systems. The main reason is that it allows to significantly reduce the controller complexity. An important… ▽ More In the context of embodied artificial intelligence, morphological computation refers to processes which are conducted by the body (and environment) that otherwise would have to be performed by the brain. Exploiting environmental and morphological properties is an important feature of embodied systems. The main reason is that it allows to significantly reduce the controller complexity. An important aspect of morphological computation is that it cannot be assigned to an embodied system per se, but that it is, as we show, behavior- and state-dependent. In this work, we evaluate two different measures of morphological computation that can be applied in robotic systems and in computer simulations of biological movement. As an example, these measures were evaluated on muscle and DC-motor driven hop** models. We show that a state-dependent analysis of the hop** behaviors provides additional insights that cannot be gained from the averaged measures alone. This work includes algorithms and computer code for the measures. △ Less

Submitted 11 December, 2015; v1 submitted 1 December, 2015; originally announced December 2015.

Comments: 10 pages, 4 figures, 1 table, 5 algorithms

MSC Class: 68T40; 97R40; 68Q30; 92C10 ACM Class: I.2; I.2.m

arXiv:1510.07305 [pdf, ps, other]

Parametrized measure models

Authors: Nihat Ay, Jürgen Jost, Hông Vân Lê, Lorenz Schwachhöfer

Abstract: We develope a new and general notion of parametric measure models and statistical models on an arbitrary sample space $Ω$ which does not assume that all measures of the model have the same null sets. This is given by a diffferentiable map from the parameter manifold $M$ into the set of finite measures or probability measures on $Ω$, respectively, which is differentiable when regarded as a map into… ▽ More We develope a new and general notion of parametric measure models and statistical models on an arbitrary sample space $Ω$ which does not assume that all measures of the model have the same null sets. This is given by a diffferentiable map from the parameter manifold $M$ into the set of finite measures or probability measures on $Ω$, respectively, which is differentiable when regarded as a map into the Banach space of all signed measures on $Ω$. Furthermore, we also give a rigorous definition of roots of measures and give a natural definition of the Fisher metric and the Amari-Chentsov tensor as the pullback of tensors defined on the space of roots of measures. We show that many features such as the preservation of this tensor under sufficient statistics and the monotonicity formula hold even in this very general set-up. △ Less

Submitted 14 July, 2017; v1 submitted 25 October, 2015; originally announced October 2015.

Comments: 29 pages, final version to appear in Bernoulli Journal

MSC Class: 53C99; 62B05

arXiv:1503.07206 [pdf, other]

Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes

Authors: Guido Montufar, Keyan Ghazi-Zahedi, Nihat Ay

Abstract: It is well known that for any finite state Markov decision process (MDP) there is a memoryless deterministic policy that maximizes the expected reward. For partially observable Markov decision processes (POMDPs), optimal memoryless policies are generally stochastic. We study the expected reward optimization problem over the set of memoryless stochastic policies. We formulate this as a constrained… ▽ More It is well known that for any finite state Markov decision process (MDP) there is a memoryless deterministic policy that maximizes the expected reward. For partially observable Markov decision processes (POMDPs), optimal memoryless policies are generally stochastic. We study the expected reward optimization problem over the set of memoryless stochastic policies. We formulate this as a constrained linear optimization problem and develop a corresponding geometric framework. We show that any POMDP has an optimal memoryless policy of limited stochasticity, which allows us to reduce the dimensionality of the search space. Experiments demonstrate that this approach enables better and faster convergence of the policy gradient on the evaluated systems. △ Less

Submitted 13 February, 2016; v1 submitted 24 March, 2015; originally announced March 2015.

Comments: 25 pages, 7 figures

MSC Class: 93E20; 90C40

arXiv:1412.2447 [pdf, ps, other]

The Information Theory of Individuality

Authors: David Krakauer, Nils Bertschinger, Eckehard Olbrich, Nihat Ay, Jessica C. Flack

Abstract: We consider biological individuality in terms of information theoretic and graphical principles. Our purpose is to extract through an algorithmic decomposition system-environment boundaries supporting individuality. We infer or detect evolved individuals rather than assume that they exist. Given a set of consistent measurements over time, we discover a coarse-grained or quantized description on a… ▽ More We consider biological individuality in terms of information theoretic and graphical principles. Our purpose is to extract through an algorithmic decomposition system-environment boundaries supporting individuality. We infer or detect evolved individuals rather than assume that they exist. Given a set of consistent measurements over time, we discover a coarse-grained or quantized description on a system, inducing partitions (which can be nested). Legitimate individual partitions will propagate information from the past into the future, whereas spurious aggregations will not. Individuals are therefore defined in terms of ongoing, bounded information processing units rather than lists of static features or conventional replication-based definitions which tend to fail in the case of cultural change. One virtue of this approach is that it could expand the scope of what we consider adaptive or biological phenomena, particularly in the microscopic and macroscopic regimes of molecular and social phenomena. △ Less

Submitted 7 December, 2014; originally announced December 2014.

arXiv:1407.6836 [pdf, other]

doi 10.1371/journal.pcbi.1004427

A Theory of Cheap Control in Embodied Systems

Authors: Guido Montufar, Keyan Ghazi-Zahedi, Nihat Ay

Abstract: We present a framework for designing cheap control architectures for embodied agents. Our derivation is guided by the classical problem of universal approximation, whereby we explore the possibility of exploiting the agent's embodiment for a new and more efficient universal approximation of behaviors generated by sensorimotor control. This embodied universal approximation is compared with the clas… ▽ More We present a framework for designing cheap control architectures for embodied agents. Our derivation is guided by the classical problem of universal approximation, whereby we explore the possibility of exploiting the agent's embodiment for a new and more efficient universal approximation of behaviors generated by sensorimotor control. This embodied universal approximation is compared with the classical non-embodied universal approximation. To exemplify our approach, we present a detailed quantitative case study for policy models defined in terms of conditional restricted Boltzmann machines. In contrast to non-embodied universal approximation, which requires an exponential number of parameters, in the embodied setting we are able to generate all possible behaviors with a drastically smaller model, thus obtaining cheap universal approximation. We test and corroborate the theory experimentally with a six-legged walking machine. The experiments show that the sufficient controller complexity predicted by our theory is tight, which means that the theory has direct practical implications. Keywords: cheap design, embodiment, sensorimotor loop, universal approximation, conditional restricted Boltzmann machine △ Less

Submitted 15 November, 2014; v1 submitted 25 July, 2014; originally announced July 2014.

Comments: 27 pages, 10 figures

MSC Class: 68T05; 60K99

arXiv:1406.3140 [pdf, other]

Expressive Power and Approximation Errors of Restricted Boltzmann Machines

Authors: Guido Montufar, Johannes Rauh, Nihat Ay

Abstract: We present explicit classes of probability distributions that can be learned by Restricted Boltzmann Machines (RBMs) depending on the number of units that they contain, and which are representative for the expressive power of the model. We use this to show that the maximal Kullback-Leibler divergence to the RBM model with $n$ visible and $m$ hidden units is bounded from above by… ▽ More We present explicit classes of probability distributions that can be learned by Restricted Boltzmann Machines (RBMs) depending on the number of units that they contain, and which are representative for the expressive power of the model. We use this to show that the maximal Kullback-Leibler divergence to the RBM model with $n$ visible and $m$ hidden units is bounded from above by $n - \left\lfloor \log(m+1) \right\rfloor - \frac{m+1}{2^{\left\lfloor\log(m+1)\right\rfloor}} \approx (n -1) - \log(m+1)$. In this way we can specify the number of hidden units that guarantees a sufficiently rich model containing different classes of distributions and respecting a given error tolerance. △ Less

Submitted 12 June, 2014; originally announced June 2014.

Comments: 9 pages, 3 figures, plus 1 page, 1 figure appendix, minor corrections of the first publication

MSC Class: 82C32; 68Q99

Journal ref: Advances in Neural Information Processing Systems 24, pages 415-423, 2011

arXiv:1406.0833 [pdf, other]

doi 10.1142/S1230161215500067

Maximizing the divergence from a hierarchical model of quantum states

Authors: Stephan Weis, Andreas Knauf, Nihat Ay, Ming-**g Zhao

Abstract: We study many-party correlations quantified in terms of the Umegaki relative entropy (divergence) from a Gibbs family known as a hierarchical model. We derive these quantities from the maximum-entropy principle which was used earlier to define the closely related irreducible correlation. We point out differences between quantum states and probability vectors which exist in hierarchical models, in… ▽ More We study many-party correlations quantified in terms of the Umegaki relative entropy (divergence) from a Gibbs family known as a hierarchical model. We derive these quantities from the maximum-entropy principle which was used earlier to define the closely related irreducible correlation. We point out differences between quantum states and probability vectors which exist in hierarchical models, in the divergence from a hierarchical model and in local maximizers of this divergence. The differences are, respectively, missing factorization, discontinuity and reduction of uncertainty. We discuss global maximizers of the mutual information of separable qubit states. △ Less

Submitted 12 February, 2015; v1 submitted 3 June, 2014; originally announced June 2014.

Comments: 18 pages, 1 figure, v2: improved exposition, v3: less typos

MSC Class: 62H20; 62F30; 94A17; 81P16; 81P45

Journal ref: Open Systems & Information Dynamics 22 (2015) 1550006

arXiv:1404.0198 [pdf, other]

doi 10.3390/e16063207

On the Fisher Metric of Conditional Probability Polytopes

Authors: Guido Montufar, Johannes Rauh, Nihat Ay

Abstract: We consider three different approaches to define natural Riemannian metrics on polytopes of stochastic matrices. First, we define a natural class of stochastic maps between these polytopes and give a metric characterization of Chentsov type in terms of invariance with respect to these maps. Second, we consider the Fisher metric defined on arbitrary polytopes through their embeddings as exponential… ▽ More We consider three different approaches to define natural Riemannian metrics on polytopes of stochastic matrices. First, we define a natural class of stochastic maps between these polytopes and give a metric characterization of Chentsov type in terms of invariance with respect to these maps. Second, we consider the Fisher metric defined on arbitrary polytopes through their embeddings as exponential families in the probability simplex. We show that these metrics can also be characterized by an invariance principle with respect to morphisms of exponential families. Third, we consider the Fisher metric resulting from embedding the polytope of stochastic matrices in a simplex of joint distributions by specifying a marginal distribution. All three approaches result in slight variations of products of Fisher metrics. This is consistent with the nature of polytopes of stochastic matrices, which are Cartesian products of probability simplices. The first approach yields a scaled product of Fisher metrics; the second, a product of Fisher metrics; and the third, a product of Fisher metrics scaled by the marginal distribution. △ Less

Submitted 6 June, 2014; v1 submitted 1 April, 2014; originally announced April 2014.

Comments: 26 pages, 2 figures

MSC Class: 53C99

Journal ref: Entropy. 2014; 16(6):3207-3233

arXiv:1402.3346 [pdf, other]

Geometry and Expressive Power of Conditional Restricted Boltzmann Machines

Authors: Guido Montufar, Nihat Ay, Keyan Ghazi-Zahedi

Abstract: Conditional restricted Boltzmann machines are undirected stochastic neural networks with a layer of input and output units connected bipartitely to a layer of hidden units. These networks define models of conditional probability distributions on the states of the output units given the states of the input units, parametrized by interaction weights and biases. We address the representational power… ▽ More Conditional restricted Boltzmann machines are undirected stochastic neural networks with a layer of input and output units connected bipartitely to a layer of hidden units. These networks define models of conditional probability distributions on the states of the output units given the states of the input units, parametrized by interaction weights and biases. We address the representational power of these models, proving results their ability to represent conditional Markov random fields and conditional distributions with restricted supports, the minimal size of universal approximators, the maximal model approximation errors, and on the dimension of the set of representable conditional distributions. We contribute new tools for investigating conditional probability models, which allow us to improve the results that can be derived from existing work on restricted Boltzmann machine probability models. △ Less

Submitted 12 March, 2015; v1 submitted 13 February, 2014; originally announced February 2014.

Comments: 30 pages, 5 figures, 1 algorithm

MSC Class: 60K99; 68T05; 68R05

arXiv:1311.2852 [pdf, other]

doi 10.3390/e16042161

Quantifying unique information

Authors: Nils Bertschinger, Johannes Rauh, Eckehard Olbrich, Jürgen Jost, Nihat Ay

Abstract: We propose new measures of shared information, unique information and synergistic information that can be used to decompose the multi-information of a pair of random variables $(Y,Z)$ with a third random variable $X$. Our measures are motivated by an operational idea of unique information which suggests that shared information and unique information should depend only on the pair marginal distribu… ▽ More We propose new measures of shared information, unique information and synergistic information that can be used to decompose the multi-information of a pair of random variables $(Y,Z)$ with a third random variable $X$. Our measures are motivated by an operational idea of unique information which suggests that shared information and unique information should depend only on the pair marginal distributions of $(X,Y)$ and $(X,Z)$. Although this invariance property has not been studied before, it is satisfied by other proposed measures of shared information. The invariance property does not uniquely determine our new measures, but it implies that the functions that we define are bounds to any other measures satisfying the same invariance property. We study properties of our measures and compare them to other candidate measures. △ Less

Submitted 15 January, 2014; v1 submitted 12 November, 2013; originally announced November 2013.

Comments: 24 pages, 2 figures. Version 2 contains less typos than version 1

MSC Class: 94A15; 94A17

Journal ref: Entropy, 16 (2014) 4, p. 2161-2183

arXiv:1309.6989 [pdf, other]

Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis

Authors: Keyan Zahedi, Georg Martius, Nihat Ay

Abstract: One of the main challenges in the field of embodied artificial intelligence is the open-ended autonomous learning of complex behaviours. Our approach is to use task-independent, information-driven intrinsic motivation(s) to support task-dependent learning. The work presented here is a preliminary step in which we investigate the predictive information (the mutual information of the past and future… ▽ More One of the main challenges in the field of embodied artificial intelligence is the open-ended autonomous learning of complex behaviours. Our approach is to use task-independent, information-driven intrinsic motivation(s) to support task-dependent learning. The work presented here is a preliminary step in which we investigate the predictive information (the mutual information of the past and future of the sensor stream) as an intrinsic drive, ideally supporting any kind of task acquisition. Previous experiments have shown that the predictive information (PI) is a good candidate to support autonomous, open-ended learning of complex behaviours, because a maximisation of the PI corresponds to an exploration of morphology- and environment-dependent behavioural regularities. The idea is that these regularities can then be exploited in order to solve any given task. Three different experiments are presented and their results lead to the conclusion that the linear combination of the one-step PI with an external reward function is not generally recommended in an episodic policy gradient setting. Only for hard tasks a great speed-up can be achieved at the cost of an asymptotic performance lost. △ Less

Submitted 26 September, 2013; originally announced September 2013.

arXiv:1303.0268 [pdf, other]

doi 10.1007/978-3-642-40020-9_85

Maximal Information Divergence from Statistical Models defined by Neural Networks

Authors: Guido Montufar, Johannes Rauh, Nihat Ay

Abstract: We review recent results about the maximal values of the Kullback-Leibler information divergence from statistical models defined by neural networks, including naive Bayes models, restricted Boltzmann machines, deep belief networks, and various classes of exponential families. We illustrate approaches to compute the maximal divergence from a given model starting from simple sub- or super-models. We… ▽ More We review recent results about the maximal values of the Kullback-Leibler information divergence from statistical models defined by neural networks, including naive Bayes models, restricted Boltzmann machines, deep belief networks, and various classes of exponential families. We illustrate approaches to compute the maximal divergence from a given model starting from simple sub- or super-models. We give a new result for deep and narrow belief networks with finite-valued units. △ Less

Submitted 1 March, 2013; originally announced March 2013.

Comments: 8 pages, 1 figure

MSC Class: 62E17; 94A17; 60E05

Journal ref: Geometric science of information : first international conference, GSI 2013, Paris, France, August 28-30, 2013. Proceedings / F. Nielsen... (eds.). Springer, 2013. - P. 759-766

arXiv:1301.7473 [pdf, other]

doi 10.1371/journal.pone.0063400

Information driven self-organization of complex robotic behaviors

Authors: Georg Martius, Ralf Der, Nihat Ay

Abstract: Information theory is a powerful tool to express principles to drive autonomous systems because it is domain invariant and allows for an intuitive interpretation. This paper studies the use of the predictive information (PI), also called excess entropy or effective measure complexity, of the sensorimotor process as a driving force to generate behavior. We study nonlinear and nonstationary systems… ▽ More Information theory is a powerful tool to express principles to drive autonomous systems because it is domain invariant and allows for an intuitive interpretation. This paper studies the use of the predictive information (PI), also called excess entropy or effective measure complexity, of the sensorimotor process as a driving force to generate behavior. We study nonlinear and nonstationary systems and introduce the time-local predicting information (TiPI) which allows us to derive exact results together with explicit update rules for the parameters of the controller in the dynamical systems framework. In this way the information principle, formulated at the level of behavior, is translated to the dynamics of the synapses. We underpin our results with a number of case studies with high-dimensional robotic systems. We show the spontaneous cooperativity in a complex physical system with decentralized control. Moreover, a jointly controlled humanoid robot develops a high behavioral variety depending on its physics and the environment it is dynamically embedded into. The behavior can be decomposed into a succession of low-dimensional modes that increasingly explore the behavior space. This is a promising way to avoid the curse of dimensionality which hinders learning systems to scale well. △ Less

Submitted 27 March, 2013; v1 submitted 30 January, 2013; originally announced January 2013.

Comments: 29 pages, 12 figures

MSC Class: 94A15; 94A17; 37N35; 68T05; 68T40 ACM Class: I.2.9; H.1.1; I.2.6

Journal ref: PLoS ONE 8(5): e63400

arXiv:1301.6975 [pdf, other]

doi 10.3390/e15051887

Quantifying Morphological Computation

Authors: Keyan Zahedi, Nihat Ay

Abstract: The field of embodied intelligence emphasises the importance of the morphology and environment with respect to the behaviour of a cognitive system. The contribution of the morphology to the behaviour, commonly known as morphological computation, is well-recognised in this community. We believe that the field would benefit from a formalisation of this concept as we would like to ask how much the mo… ▽ More The field of embodied intelligence emphasises the importance of the morphology and environment with respect to the behaviour of a cognitive system. The contribution of the morphology to the behaviour, commonly known as morphological computation, is well-recognised in this community. We believe that the field would benefit from a formalisation of this concept as we would like to ask how much the morphology and the environment contribute to an embodied agent's behaviour, or how an embodied agent can maximise the exploitation of its morphology within its environment. In this work we derive two concepts of measuring morphological computation, and we discuss their relation to the Information Bottleneck Method. The first concepts asks how much the world contributes to the overall behaviour and the second concept asks how much the agent's action contributes to a behaviour. Various measures are derived from the concepts and validated in two experiments which highlight their strengths and weaknesses. △ Less

Submitted 20 June, 2013; v1 submitted 29 January, 2013; originally announced January 2013.

Journal ref: Entropy. 2013; 15(5):1887-1915

arXiv:1210.7719 [pdf, ps, other]

doi 10.1007/s12064-013-0186-3

Robustness, Canalyzing Functions and Systems Design

Authors: Johannes Rauh, Nihat Ay

Abstract: We study a notion of robustness of a Markov kernel that describes a system of several input random variables and one output random variable. Robustness requires that the behaviour of the system does not change if one or several of the input variables are knocked out. If the system is required to be robust against too many knockouts, then the output variable cannot distinguish reliably between inpu… ▽ More We study a notion of robustness of a Markov kernel that describes a system of several input random variables and one output random variable. Robustness requires that the behaviour of the system does not change if one or several of the input variables are knocked out. If the system is required to be robust against too many knockouts, then the output variable cannot distinguish reliably between input states and must be independent of the input. We study how many input states the output variable can distinguish as a function of the required level of robustness. Gibbs potentials allow a mechanistic description of the behaviour of the system after knockouts. Robustness imposes structural constraints on these potentials. We show that interaction families of Gibbs potentials allow to describe robust systems. Given a distribution of the input random variables and the Markov kernel describing the system, we obtain a joint probability distribution. Robustness implies a number of conditional independence statements for this joint distribution. The set of all probability distributions corresponding to robust systems can be decomposed into a finite union of components, and we find parametrizations of the components. The decomposition corresponds to a primary decomposition of the conditional independence ideal and can be derived from more general results about generalized binomial edge ideals. △ Less

Submitted 29 October, 2012; originally announced October 2012.

Comments: 20 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:1110.1338

MSC Class: 93B51

Journal ref: Theory in Biosciences 133 (2), 2014, p. 63-78

arXiv:1207.6736 [pdf, ps, other]

doi 10.1007/s00440-014-0574-8

Information geometry and sufficient statistics

Authors: Nihat Ay, Jürgen Jost, Hông Vân Lê, Lorenz Schwachhöfer

Abstract: Information geometry provides a geometric approach to families of statistical models. The key geometric structures are the Fisher quadratic form and the Amari-Chentsov tensor. In statistics, the notion of sufficient statistic expresses the criterion for passing from one model to another without loss of information. This leads to the question how the geometric structures behave under such sufficien… ▽ More Information geometry provides a geometric approach to families of statistical models. The key geometric structures are the Fisher quadratic form and the Amari-Chentsov tensor. In statistics, the notion of sufficient statistic expresses the criterion for passing from one model to another without loss of information. This leads to the question how the geometric structures behave under such sufficient statistics. While this is well studied in the finite sample size case, in the infinite case, we encounter technical problems concerning the appropriate topologies. Here, we introduce notions of parametrized measure models and tensor fields on them that exhibit the right behavior under statistical transformations. Within this framework, we can then handle the topological issues and show that the Fisher metric and the Amari-Chentsov tensor on statistical models in the class of symmetric 2-tensor fields and 3-tensor fields can be uniquely (up to a constant) characterized by their invariance under sufficient statistics, thereby achieving a full generalization of the original result of Chentsov to infinite sample sizes. More generally, we decompose Markov morphisms between statistical models in terms of statistics. In particular, a monotonicity result for the Fisher information naturally follows. △ Less

Submitted 4 December, 2013; v1 submitted 28 July, 2012; originally announced July 2012.

Comments: 37 p, final version, minor corrections, improved presentation

MSC Class: 53C99; 62B05

Journal ref: Probability Theory and Related Fields: Volume 162, Issue 1 (2015), Page 327-364

arXiv:1110.1338 [pdf, ps, other]

Robustness and Conditional Independence Ideals

Authors: Johannes Rauh, Nihat Ay

Abstract: We study notions of robustness of Markov kernels and probability distribution of a system that is described by $n$ input random variables and one output random variable. Markov kernels can be expanded in a series of potentials that allow to describe the system's behaviour after knockouts. Robustness imposes structural constraints on these potentials. Robustness of probability distributions is defi… ▽ More We study notions of robustness of Markov kernels and probability distribution of a system that is described by $n$ input random variables and one output random variable. Markov kernels can be expanded in a series of potentials that allow to describe the system's behaviour after knockouts. Robustness imposes structural constraints on these potentials. Robustness of probability distributions is defined via conditional independence statements. These statements can be studied algebraically. The corresponding conditional independence ideals are related to binary edge ideals. The set of robust probability distributions lies on an algebraic variety. We compute a Gröbner basis of this ideal and study the irreducible decomposition of the variety. These algebraic results allow to parametrize the set of all robust probability distributions. △ Less

Submitted 6 October, 2011; originally announced October 2011.

Comments: 16 pages

MSC Class: 13P25; 13P10; 62H20

arXiv:1108.3984 [pdf, ps, other]

doi 10.1142/S1230161212500072

Process Dimension of Classical and Non-Commutative Processes

Authors: Wolfgang Löhr, Arleta Szkoła, Nihat Ay

Abstract: We treat observable operator models (OOM) and their non-commutative generalisation, which we call NC-OOMs. A natural characteristic of a stochastic process in the context of classical OOM theory is the process dimension. We investigate its properties within the more general formulation, which allows to consider process dimension as a measure of complexity of non-commutative processes: We prove low… ▽ More We treat observable operator models (OOM) and their non-commutative generalisation, which we call NC-OOMs. A natural characteristic of a stochastic process in the context of classical OOM theory is the process dimension. We investigate its properties within the more general formulation, which allows to consider process dimension as a measure of complexity of non-commutative processes: We prove lower semi-continuity, and derive an ergodic decomposition formula. Further, we obtain results on the close relationship between the canonical OOM and the concept of causal states which underlies the definition of statistical complexity. In particular, the topological statistical complexity, i.e. the logarithm of the number of causal states, turns out to be an upper bound to the logarithm of process dimension. △ Less

Submitted 19 August, 2011; originally announced August 2011.

Comments: 8 pages

Journal ref: Open Syst. Inf. Dyn. 19(1), 2012

arXiv:1010.5720 [pdf, ps, other]

Information-theoretic inference of common ancestors

Authors: Bastian Steudel, Nihat Ay

Abstract: A directed acyclic graph (DAG) partially represents the conditional independence structure among observations of a system if the local Markov condition holds, that is, if every variable is independent of its non-descendants given its parents. In general, there is a whole class of DAGs that represents a given set of conditional independence relations. We are interested in properties of this class t… ▽ More A directed acyclic graph (DAG) partially represents the conditional independence structure among observations of a system if the local Markov condition holds, that is, if every variable is independent of its non-descendants given its parents. In general, there is a whole class of DAGs that represents a given set of conditional independence relations. We are interested in properties of this class that can be derived from observations of a subsystem only. To this end, we prove an information theoretic inequality that allows for the inference of common ancestors of observed parts in any DAG representing some unknown larger system. More explicitly, we show that a large amount of dependence in terms of mutual information among the observations implies the existence of a common ancestor that distributes this information. Within the causal interpretation of DAGs our result can be seen as a quantitative extension of Reichenbach's Principle of Common Cause to more than two variables. Our conclusions are valid also for non-probabilistic observations such as binary strings, since we state the proof for an axiomatized notion of mutual information that includes the stochastic as well as the algorithmic version. △ Less

Submitted 27 October, 2010; originally announced October 2010.

Comments: 18 pages, 4 figures

arXiv:1005.1593 [pdf, ps, other]

Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines

Authors: Guido Montufar, Nihat Ay

Abstract: We improve recently published results about resources of Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) required to make them Universal Approximators. We show that any distribution p on the set of binary vectors of length n can be arbitrarily well approximated by an RBM with k-1 hidden units, where k is the minimal number of pairs of binary vectors differing in only one entry s… ▽ More We improve recently published results about resources of Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) required to make them Universal Approximators. We show that any distribution p on the set of binary vectors of length n can be arbitrarily well approximated by an RBM with k-1 hidden units, where k is the minimal number of pairs of binary vectors differing in only one entry such that their union contains the support set of p. In important cases this number is half of the cardinality of the support set of p. We construct a DBN with 2^n/2(n-b), b ~ log(n), hidden layers of width n that is capable of approximating any distribution on {0,1}^n arbitrarily well. This confirms a conjecture presented by Le Roux and Bengio 2010. △ Less

Submitted 26 July, 2010; v1 submitted 10 May, 2010; originally announced May 2010.

arXiv:1001.2686 [pdf, ps, other]

doi 10.3390/e13061200

Effective complexity of stationary process realizations

Authors: Nihat Ay, Markus Mueller, Arleta Szkola

Abstract: The concept of effective complexity of an object as the minimal description length of its regularities has been initiated by Gell-Mann and Lloyd. The regularities are modeled by means of ensembles, that is probability distributions on finite binary strings. In our previous paper we propose a definition of effective complexity in precise terms of algorithmic information theory. Here we investigate… ▽ More The concept of effective complexity of an object as the minimal description length of its regularities has been initiated by Gell-Mann and Lloyd. The regularities are modeled by means of ensembles, that is probability distributions on finite binary strings. In our previous paper we propose a definition of effective complexity in precise terms of algorithmic information theory. Here we investigate the effective complexity of binary strings generated by stationary, in general not computable, processes. We show that under not too strong conditions long typical process realizations are effectively simple. Our results become most transparent in the context of coarse effective complexity which is a modification of the original notion of effective complexity that uses less parameters in its definition. A similar modification of the related concept of sophistication has been suggested by Antunes and Fortnow. △ Less

Submitted 5 April, 2011; v1 submitted 15 January, 2010; originally announced January 2010.

Comments: 14 pages, no figures

arXiv:0912.4450 [pdf, other]

doi 10.1140/epjb/e2010-00209-0

Quantifying structure in networks

Authors: Eckehard Olbrich, Thomas Kahle, Nils Bertschinger, Nihat Ay, Juergen Jost

Abstract: We investigate exponential families of random graph distributions as a framework for systematic quantification of structure in networks. In this paper we restrict ourselves to undirected unlabeled graphs. For these graphs, the counts of subgraphs with no more than k links are a sufficient statistics for the exponential families of graphs with interactions between at most k links. In this framewo… ▽ More We investigate exponential families of random graph distributions as a framework for systematic quantification of structure in networks. In this paper we restrict ourselves to undirected unlabeled graphs. For these graphs, the counts of subgraphs with no more than k links are a sufficient statistics for the exponential families of graphs with interactions between at most k links. In this framework we investigate the dependencies between several observables commonly used to quantify structure in networks, such as the degree distribution, cluster and assortativity coefficients. △ Less

Submitted 22 December, 2009; originally announced December 2009.

Comments: 17 pages, 3 figures

Journal ref: The European Physical Journal, Volume 77 (2010), Issue 2, pp 239-247

arXiv:0910.2039 [pdf, other]

Higher coordination with less control - A result of information maximization in the sensorimotor loop

Authors: Keyan Zahedi, Nihat Ay, Ralf Der

Abstract: This work presents a novel learning method in the context of embodied artificial intelligence and self-organization, which has as few assumptions and restrictions as possible about the world and the underlying model. The learning rule is derived from the principle of maximizing the predictive information in the sensorimotor loop. It is evaluated on robot chains of varying length with individually… ▽ More This work presents a novel learning method in the context of embodied artificial intelligence and self-organization, which has as few assumptions and restrictions as possible about the world and the underlying model. The learning rule is derived from the principle of maximizing the predictive information in the sensorimotor loop. It is evaluated on robot chains of varying length with individually controlled, non-communicating segments. The comparison of the results shows that maximizing the predictive information per wheel leads to a higher coordinated behavior of the physically connected robots compared to a maximization per robot. Another focus of this paper is the analysis of the effect of the robot chain length on the overall behavior of the robots. It will be shown that longer chains with less capable controllers outperform those of shorter length and more complex controllers. The reason is found and discussed in the information-geometric interpretation of the learning process. △ Less

Submitted 18 May, 2010; v1 submitted 11 October, 2009; originally announced October 2009.

arXiv:0906.5462 [pdf, ps, other]

doi 10.1016/j.ijar.2011.01.013

Support Sets in Exponential Families and Oriented Matroid Theory

Authors: Johannes Rauh, Thomas Kahle, Nihat Ay

Abstract: The closure of a discrete exponential family is described by a finite set of equations corresponding to the circuits of an underlying oriented matroid. These equations are similar to the equations used in algebraic statistics, although they need not be polynomial in the general case. This description allows for a combinatorial study of the possible support sets in the closure of an exponential fam… ▽ More The closure of a discrete exponential family is described by a finite set of equations corresponding to the circuits of an underlying oriented matroid. These equations are similar to the equations used in algebraic statistics, although they need not be polynomial in the general case. This description allows for a combinatorial study of the possible support sets in the closure of an exponential family. If two exponential families induce the same oriented matroid, then their closures have the same support sets. Furthermore, the positive cocircuits give a parameterization of the closure of the exponential family. △ Less

Submitted 15 September, 2011; v1 submitted 30 June, 2009; originally announced June 2009.

Comments: 27 pages, extended version published in IJAR

MSC Class: 52C40; 62B05; 14P15

Journal ref: International Journal of Approximate Reasoning Volume 52, Issue 5, July 2011, Pages 613-626

arXiv:0810.5663 [pdf, ps, other]

doi 10.1109/TIT.2010.2053892

Effective Complexity and its Relation to Logical Depth

Authors: Nihat Ay, Markus Mueller, Arleta Szkola

Abstract: Effective complexity measures the information content of the regularities of an object. It has been introduced by M. Gell-Mann and S. Lloyd to avoid some of the disadvantages of Kolmogorov complexity, also known as algorithmic information content. In this paper, we give a precise formal definition of effective complexity and rigorous proofs of its basic properties. In particular, we show that in… ▽ More Effective complexity measures the information content of the regularities of an object. It has been introduced by M. Gell-Mann and S. Lloyd to avoid some of the disadvantages of Kolmogorov complexity, also known as algorithmic information content. In this paper, we give a precise formal definition of effective complexity and rigorous proofs of its basic properties. In particular, we show that incompressible binary strings are effectively simple, and we prove the existence of strings that have effective complexity close to their lengths. Furthermore, we show that effective complexity is related to Bennett's logical depth: If the effective complexity of a string $x$ exceeds a certain explicit threshold then that string must have astronomically large depth; otherwise, the depth can be arbitrarily small. △ Less

Submitted 31 October, 2008; originally announced October 2008.

Comments: 14 pages, 2 figures

Journal ref: IEEE Trans. Inf. Th., Vol. 56/9 pp. 4593-4607 (2010)

arXiv:0806.2552 [pdf, ps, other]

doi 10.1103/PhysRevE.79.026201

Complexity Measures from Interaction Structures

Authors: Thomas Kahle, Eckehard Olbrich, Juergen Jost, Nihat Ay

Abstract: We evaluate new complexity measures on the symbolic dynamics of coupled tent maps and cellular automata. These measures quantify complexity in terms of $k$-th order statistical dependencies that cannot be reduced to interactions between $k-1$ units. We demonstrate that these measures are able to identify complex dynamical regimes. We evaluate new complexity measures on the symbolic dynamics of coupled tent maps and cellular automata. These measures quantify complexity in terms of $k$-th order statistical dependencies that cannot be reduced to interactions between $k-1$ units. We demonstrate that these measures are able to identify complex dynamical regimes. △ Less

Submitted 26 November, 2008; v1 submitted 16 June, 2008; originally announced June 2008.

Comments: 11 pages, figures improved, minor changes to the text

Journal ref: Phys. Rev. E 79 (2009), 026201

Showing 1–50 of 54 results for author: Ay, N