Search | arXiv e-print repository

arXiv:2005.06364 [pdf, other]

Adaptive Smoothing Path Integral Control

Authors: Dominik Thalmeier, Hilbert J. Kappen, Simone Totaro, Vicenç Gómez

Abstract: In Path Integral control problems a representation of an optimally controlled dynamical system can be formally computed and serve as a guidepost to learn a parametrized policy. The Path Integral Cross-Entropy (PICE) method tries to exploit this, but is hampered by poor sample efficiency. We propose a model-free algorithm called ASPIC (Adaptive Smoothing of Path Integral Control) that applies an in… ▽ More In Path Integral control problems a representation of an optimally controlled dynamical system can be formally computed and serve as a guidepost to learn a parametrized policy. The Path Integral Cross-Entropy (PICE) method tries to exploit this, but is hampered by poor sample efficiency. We propose a model-free algorithm called ASPIC (Adaptive Smoothing of Path Integral Control) that applies an inf-convolution to the cost function to speedup convergence of policy optimization. We identify PICE as the infinite smoothing limit of such technique and show that the sample efficiency problems that PICE suffers disappear for finite levels of smoothing. For zero smoothing this method becomes a greedy optimization of the cost, which is the standard approach in current reinforcement learning. We show analytically and empirically that intermediate levels of smoothing are optimal, which renders the new method superior to both PICE and direct cost-optimization. △ Less

Submitted 13 May, 2020; originally announced May 2020.

Comments: 23 pages, 5 figures, NeurIPS 2019 Optimization Foundations of Reinforcement Learning Workshop (OptRL 2019)

arXiv:1710.09825 [pdf, other]

doi 10.1103/PhysRevLett.120.268103

On the role of synaptic stochasticity in training low-precision neural networks

Authors: Carlo Baldassi, Federica Gerace, Hilbert J. Kappen, Carlo Lucibello, Luca Saglietti, Enzo Tartaglione, Riccardo Zecchina

Abstract: Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties such as robustness and good generalization performanc… ▽ More Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties such as robustness and good generalization performance, while typical solutions are isolated and hard to find. Binary solutions of the standard perceptron problem are obtained from a simple gradient descent procedure on a set of real values parametrizing a probability distribution over the binary synapses. Both analytical and numerical results are presented. An algorithmic extension aimed at training discrete deep neural networks is also investigated. △ Less

Submitted 19 March, 2018; v1 submitted 26 October, 2017; originally announced October 2017.

Comments: 7 pages + 14 pages of supplementary material

Journal ref: Phys. Rev. Lett. 120, 268103 (2018)

arXiv:1606.07777 [pdf, other]

doi 10.1088/1751-8121/50/3/034006

Action selection in growing state spaces: Control of Network Structure Growth

Authors: Dominik Thalmeier, Vicenç Gómez, Hilbert J. Kappen

Abstract: The dynamical processes taking place on a network depend on its topology. Influencing the growth process of a network therefore has important implications on such dynamical processes. We formulate the problem of influencing the growth of a network as a stochastic optimal control problem in which a structural cost function penalizes undesired topologies. We approximate this control problem with a r… ▽ More The dynamical processes taking place on a network depend on its topology. Influencing the growth process of a network therefore has important implications on such dynamical processes. We formulate the problem of influencing the growth of a network as a stochastic optimal control problem in which a structural cost function penalizes undesired topologies. We approximate this control problem with a restricted class of control problems that can be solved using probabilistic inference methods. To deal with the increasing problem dimensionality, we introduce an adaptive importance sampling method for approximating the optimal control. We illustrate this methodology in the context of formation of information cascades, considering the task of influencing the structure of a growing conversation thread, as in Internet forums. Using a realistic model of growing trees, we show that our approach can yield conversation threads with better structural properties than the ones observed without control. △ Less

Submitted 27 December, 2016; v1 submitted 23 June, 2016; originally announced June 2016.

Comments: 23 pages, 7 figures

Journal ref: Journal of Physics A: Mathematical and Theoretical, Volume 50, Number 3, 034006, 2017

arXiv:1605.00278 [pdf, other]

doi 10.1109/TSP.2017.2686340

Particle Smoothing for Hidden Diffusion Processes: Adaptive Path Integral Smoother

Authors: H. -Ch. Ruiz, H. J. Kappen

Abstract: Particle smoothing methods are used for inference of stochastic processes based on noisy observations. Typically, the estimation of the marginal posterior distribution given all observations is cumbersome and computational intensive. In this paper, we propose a simple algorithm based on path integral control theory to estimate the smoothing distribution of continuous-time diffusion processes with… ▽ More Particle smoothing methods are used for inference of stochastic processes based on noisy observations. Typically, the estimation of the marginal posterior distribution given all observations is cumbersome and computational intensive. In this paper, we propose a simple algorithm based on path integral control theory to estimate the smoothing distribution of continuous-time diffusion processes with partial observations. In particular, we use an adaptive importance sampling method to improve the effective sampling size of the posterior over processes given the observations and the reliability of the estimation of the marginals. This is achieved by estimating a feedback controller to sample efficiently from the joint smoothing distributions. We compare the results with estimations obtained from the standard Forward Filter/Backward Simulator for two diffusion processes of different complexity. We show that the proposed method gives more reliable estimations than the standard FFBSi when the smoothing distribution is poorly represented by the filter distribution. △ Less

Submitted 6 March, 2017; v1 submitted 1 May, 2016; originally announced May 2016.

Comments: 16 pages, 13 figures

arXiv:1505.01874 [pdf, ps, other]

doi 10.1007/s10955-016-1446-7

Adaptive importance sampling for control and inference

Authors: Hilbert Johan Kappen, Hans Christian Ruiz

Abstract: Path integral (PI) control problems are a restricted class of non-linear control problems that can be solved formally as a Feyman-Kac path integral and can be estimated using Monte Carlo sampling. In this contribution we review path integral control theory in the finite horizon case. We subsequently focus on the problem how to compute and represent control solutions. Within the PI theory, the qu… ▽ More Path integral (PI) control problems are a restricted class of non-linear control problems that can be solved formally as a Feyman-Kac path integral and can be estimated using Monte Carlo sampling. In this contribution we review path integral control theory in the finite horizon case. We subsequently focus on the problem how to compute and represent control solutions. Within the PI theory, the question of how to compute becomes the question of importance sampling. Efficient importance samplers are state feedback controllers and the use of these requires an efficient representation. Learning and representing effective state-feedback controllers for non-linear stochastic control problems is a very challenging, and largely unsolved, problem. We show how to learn and represent such controllers using ideas from the cross entropy method. We derive a gradient descent method that allows to learn feed-back controllers using an arbitrary parametrisation. We refer to this method as the Path Integral Cross Entropy method or PICE. We illustrate this method for some simple examples. The path integral control methods can be used to estimate the posterior distribution in latent state models. In neuroscience these problems arise when estimating connectivity from neural recording data using EM. We demonstrate the path integral control method as an accurate alternative to particle filtering. △ Less

Submitted 2 September, 2015; v1 submitted 7 May, 2015; originally announced May 2015.

Comments: 23 pages, 4 figures

arXiv:1502.04548 [pdf, other]

Real-Time Stochastic Optimal Control for Multi-agent Quadrotor Systems

Authors: Vicenç Gómez, Sep Thijssen, Andrew Symington, Stephen Hailes, Hilbert J. Kappen

Abstract: This paper presents a novel method for controlling teams of unmanned aerial vehicles using Stochastic Optimal Control (SOC) theory. The approach consists of a centralized high-level planner that computes optimal state trajectories as velocity sequences, and a platform-specific low-level controller which ensures that these velocity sequences are met. The planning task is expressed as a centralized… ▽ More This paper presents a novel method for controlling teams of unmanned aerial vehicles using Stochastic Optimal Control (SOC) theory. The approach consists of a centralized high-level planner that computes optimal state trajectories as velocity sequences, and a platform-specific low-level controller which ensures that these velocity sequences are met. The planning task is expressed as a centralized path-integral control problem, for which optimal control computation corresponds to a probabilistic inference problem that can be solved by efficient sampling methods. Through simulation we show that our SOC approach (a) has significant benefits compared to deterministic control and other SOC methods in multimodal problems with noise-dependent optimal solutions, (b) is capable of controlling a large number of platforms in real-time, and (c) yields collective emergent behaviour in the form of flight formations. Finally, we show that our approach works for real platforms, by controlling a team of three quadrotors in outdoor conditions. △ Less

Submitted 12 May, 2020; v1 submitted 16 February, 2015; originally announced February 2015.

Comments: 17 pages, 8 figures, 26th International Conference on Automated Planning and Scheduling

arXiv:1406.0993 [pdf, ps, other]

Latent Kullback Leibler Control for Continuous-State Systems using Probabilistic Graphical Models

Authors: Takamitsu Matsubara, Vicenç Gómez, Hilbert J. Kappen

Abstract: Kullback Leibler (KL) control problems allow for efficient computation of optimal control by solving a principal eigenvector problem. However, direct applicability of such framework to continuous state-action systems is limited. In this paper, we propose to embed a KL control problem in a probabilistic graphical model where observed variables correspond to the continuous (possibly high-dimensional… ▽ More Kullback Leibler (KL) control problems allow for efficient computation of optimal control by solving a principal eigenvector problem. However, direct applicability of such framework to continuous state-action systems is limited. In this paper, we propose to embed a KL control problem in a probabilistic graphical model where observed variables correspond to the continuous (possibly high-dimensional) state of the system and latent variables correspond to a discrete (low-dimensional) representation of the state amenable for KL control computation. We present two examples of this approach. The first one uses standard hidden Markov models (HMMs) and computes exact optimal control, but is only applicable to low-dimensional systems. The second one uses factorial HMMs, it is scalable to higher dimensional problems, but control computation is approximate. We illustrate both examples in several robot motor control tasks. △ Less

Submitted 27 August, 2014; v1 submitted 4 June, 2014; originally announced June 2014.

Comments: 9 pages, 5 figures, accepted in Uncertainty in Artificial Intelligence (UAI '14)

ACM Class: I.2.8; I.2.9; G.3

arXiv:1209.5656 [pdf, ps, other]

Learning Price-Elasticity of Smart Consumers in Power Distribution Systems

Authors: Vicenç Gómez, Michael Chertkov, Scott Backhaus, Hilbert J. Kappen

Abstract: Demand Response is an emerging technology which will transform the power grid of tomorrow. It is revolutionary, not only because it will enable peak load shaving and will add resources to manage large distribution systems, but mainly because it will tap into an almost unexplored and extremely powerful pool of resources comprised of many small individual consumers on distribution grids. However, to… ▽ More Demand Response is an emerging technology which will transform the power grid of tomorrow. It is revolutionary, not only because it will enable peak load shaving and will add resources to manage large distribution systems, but mainly because it will tap into an almost unexplored and extremely powerful pool of resources comprised of many small individual consumers on distribution grids. However, to utilize these resources effectively, the methods used to engage these resources must yield accurate and reliable control. A diversity of methods have been proposed to engage these new resources. As opposed to direct load control, many methods rely on consumers and/or loads responding to exogenous signals, typically in the form of energy pricing, originating from the utility or system operator. Here, we propose an open loop communication-lite method for estimating the price elasticity of many customers comprising a distribution system. We utilize a sparse linear regression method that relies on operator-controlled, inhomogeneous minor price variations, which will be fair to all the consumers. Our numerical experiments show that reliable estimation of individual and thus aggregated instantaneous elasticities is possible. We describe the limits of the reliable reconstruction as functions of the three key parameters of the system: (i) ratio of the number of communication slots (time units) per number of engaged consumers; (ii) level of sparsity (in consumer response); and (iii) signal-to-noise ratio. △ Less

Submitted 25 September, 2012; originally announced September 2012.

Comments: 6 pages, 5 figures, IEEE SmartGridComm 2012

ACM Class: C.2.1; G.3

arXiv:1203.0652 [pdf, ps, other]

doi 10.1007/s11280-012-0162-8

A likelihood-based framework for the analysis of discussion threads

Authors: Vicenç Gómez, Hilbert J. Kappen, Nelly Litvak, Andreas Kaltenbrunner

Abstract: Online discussion threads are conversational cascades in the form of posted messages that can be generally found in social systems that comprise many-to-many interaction such as blogs, news aggregators or bulletin board systems. We propose a framework based on generative models of growing trees to analyse the structure and evolution of discussion threads. We consider the growth of a discussion to… ▽ More Online discussion threads are conversational cascades in the form of posted messages that can be generally found in social systems that comprise many-to-many interaction such as blogs, news aggregators or bulletin board systems. We propose a framework based on generative models of growing trees to analyse the structure and evolution of discussion threads. We consider the growth of a discussion to be determined by an interplay between popularity, novelty and a trend (or bias) to reply to the thread originator. The relevance of these features is estimated using a full likelihood approach and allows to characterize the habits and communication patterns of a given platform and/or community. △ Less

Submitted 3 March, 2012; originally announced March 2012.

Comments: 31 pages, 12 figures, journal

ACM Class: G.3; H.5.4

arXiv:1109.0486 [pdf, ps, other]

The Variational Garrote

Authors: Hilbert J. Kappen, Vicenç Gómez

Abstract: In this paper, we present a new variational method for sparse regression using $L_0$ regularization. The variational parameters appear in the approximate model in a way that is similar to Breiman's Garrote model. We refer to this method as the variational Garrote (VG). We show that the combination of the variational approximation and $L_0$ regularization has the effect of making the problem effect… ▽ More In this paper, we present a new variational method for sparse regression using $L_0$ regularization. The variational parameters appear in the approximate model in a way that is similar to Breiman's Garrote model. We refer to this method as the variational Garrote (VG). We show that the combination of the variational approximation and $L_0$ regularization has the effect of making the problem effectively of maximal rank even when the number of samples is small compared to the number of variables. The VG is compared numerically with the Lasso method, ridge regression and the recently introduced paired mean field method (PMF) (M. Titsias & M. Lázaro-Gredilla., NIPS 2012). Numerical results show that the VG and PMF yield more accurate predictions and more accurately reconstruct the true model than the other methods. It is shown that the VG finds correct solutions when the Lasso solution is inconsistent due to large input correlations. Globally, VG is significantly faster than PMF and tends to perform better as the problems become denser and in problems with strongly correlated inputs. The naive implementation of the VG scales cubic with the number of features. By introducing Lagrange multipliers we obtain a dual formulation of the problem that scales cubic in the number of samples, but close to linear in the number of features. △ Less

Submitted 12 November, 2012; v1 submitted 2 September, 2011; originally announced September 2011.

Comments: 26 pages, 11 figures

arXiv:1011.0673 [pdf, ps, other]

doi 10.1145/1995966.1995992

Modeling the structure and evolution of discussion cascades

Authors: Vicenç Gómez, Hilbert J. Kappen, Andreas Kaltenbrunner

Abstract: We analyze the structure and evolution of discussion cascades in four popular websites: Slashdot, Barrapunto, Meneame and Wikipedia. Despite the big heterogeneities between these sites, a preferential attachment (PA) model with bias to the root can capture the temporal evolution of the observed trees and many of their statistical properties, namely, probability distributions of the branching facto… ▽ More We analyze the structure and evolution of discussion cascades in four popular websites: Slashdot, Barrapunto, Meneame and Wikipedia. Despite the big heterogeneities between these sites, a preferential attachment (PA) model with bias to the root can capture the temporal evolution of the observed trees and many of their statistical properties, namely, probability distributions of the branching factors (degrees), subtree sizes and certain correlations. The parameters of the model are learned efficiently using a novel maximum likelihood estimation scheme for PA and provide a figurative interpretation about the communication habits and the resulting discussion cascades on the four different websites. △ Less

Submitted 15 April, 2011; v1 submitted 2 November, 2010; originally announced November 2010.

Comments: 10 pages, 11 figures

ACM Class: J.4; G.2.2

Journal ref: 22nd ACM conference on hypertext and hypermedia (HT 2011)

arXiv:1004.2027 [pdf, ps, other]

Dynamic Policy Programming

Authors: Mohammad Gheshlaghi Azar, Vicenc Gomez, Hilbert J. Kappen

Abstract: In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. We prove the finite-iteration and asymptotic l\infty-norm performance-loss bounds for DPP in the presence of approximation/estimation error. The bounds are expressed in terms of the l\infty-norm of the average accumula… ▽ More In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. We prove the finite-iteration and asymptotic l\infty-norm performance-loss bounds for DPP in the presence of approximation/estimation error. The bounds are expressed in terms of the l\infty-norm of the average accumulated error as opposed to the l\infty-norm of the error in the case of the standard approximate value iteration (AVI) and the approximate policy iteration (API). This suggests that DPP can achieve a better performance than AVI and API since it averages out the simulation noise caused by Monte-Carlo sampling throughout the learning process. We examine this theoretical results numerically by com- paring the performance of the approximate variants of DPP with existing reinforcement learning (RL) methods on different problem domains. Our results show that, in all cases, DPP-based algorithms outperform other RL methods by a wide margin. △ Less

Submitted 6 September, 2011; v1 submitted 12 April, 2010; originally announced April 2010.

Comments: Submitted to Journal of Machine Learning Research

arXiv:0901.0786 [pdf, ps, other]

Approximate inference on planar graphs using Loop Calculus and Belief Propagation

Authors: V. Gómez, H. J. Kappen, M. Chertkov

Abstract: We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006) allows to express the exact partition function of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable.… ▽ More We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006) allows to express the exact partition function of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in (Certkov et al., 2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. We analyze the performance of the algorithm for the partition function approximation for models with binary variables and pairwise interactions on grids and other planar graphs. We study in detail both the loop series and the equivalent Pfaffian series and show that the first term of the Pfaffian series for the general, intractable planar model, can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state-of-the-art methods for approximate inference. △ Less

Submitted 25 May, 2009; v1 submitted 7 January, 2009; originally announced January 2009.

Comments: 23 pages, 10 figures. Submitted to Journal of Machine Learning Research. Proceedings version accepted for UAI 2009

arXiv:cs/0612109 [pdf, ps, other]

Truncating the loop series expansion for Belief Propagation

Authors: Vicenc Gomez, J. M. Mooij, H. J. Kappen

Abstract: Recently, M. Chertkov and V.Y. Chernyak derived an exact expression for the partition sum (normalization constant) corresponding to a graphical model, which is an expansion around the Belief Propagation solution. By adding correction terms to the BP free energy, one for each "generalized loop" in the factor graph, the exact partition sum is obtained. However, the usually enormous number of gener… ▽ More Recently, M. Chertkov and V.Y. Chernyak derived an exact expression for the partition sum (normalization constant) corresponding to a graphical model, which is an expansion around the Belief Propagation solution. By adding correction terms to the BP free energy, one for each "generalized loop" in the factor graph, the exact partition sum is obtained. However, the usually enormous number of generalized loops generally prohibits summation over all correction terms. In this article we introduce Truncated Loop Series BP (TLSBP), a particular way of truncating the loop series of M. Chertkov and V.Y. Chernyak by considering generalized loops as compositions of simple loops. We analyze the performance of TLSBP in different scenarios, including the Ising model, regular random graphs and on Promedas, a large probabilistic medical diagnostic system. We show that TLSBP often improves upon the accuracy of the BP solution, at the expense of increased computation time. We also show that the performance of TLSBP strongly depends on the degree of interaction between the variables. For weak interactions, truncating the series leads to significant improvements, whereas for strong interactions it can be ineffective, even if a high number of terms is considered. △ Less

Submitted 25 July, 2007; v1 submitted 21 December, 2006; originally announced December 2006.

Comments: 31 pages, 12 figures, submitted to Journal of Machine Learning Research

Journal ref: The Journal of Machine Learning Research, 8(Sep):1987--2016, 2007

arXiv:cond-mat/0608312 [pdf, ps, other]

doi 10.1103/PhysRevE.76.011102

On Cavity Approximations for Graphical Models

Authors: T. Rizzo, B. Wemmenhove, H. J. Kappen

Abstract: We reformulate the Cavity Approximation (CA), a class of algorithms recently introduced for improving the Bethe approximation estimates of marginals in graphical models. In our new formulation, which allows for the treatment of multivalued variables, a further generalization to factor graphs with arbitrary order of interaction factors is explicitly carried out, and a message passing algorithm th… ▽ More We reformulate the Cavity Approximation (CA), a class of algorithms recently introduced for improving the Bethe approximation estimates of marginals in graphical models. In our new formulation, which allows for the treatment of multivalued variables, a further generalization to factor graphs with arbitrary order of interaction factors is explicitly carried out, and a message passing algorithm that implements the first order correction to the Bethe approximation is described. Furthermore we investigate an implementation of the CA for pairwise interactions. In all cases considered we could confirm that CA[k] with increasing $k$ provides a sequence of approximations of markedly increasing precision. Furthermore in some cases we could also confirm the general expectation that the approximation of order $k$, whose computational complexity is $O(N^{k+1})$ has an error that scales as $1/N^{k+1}$ with the size of the system. We discuss the relation between this approach and some recent developments in the field. △ Less

Submitted 16 January, 2007; v1 submitted 14 August, 2006; originally announced August 2006.

Comments: Extension to factor graphs and comments on related work added

arXiv:cs/0504030 [pdf, ps, other]

doi 10.1109/TIT.2007.909166

Sufficient conditions for convergence of the Sum-Product Algorithm

Authors: Joris M. Mooij, Hilbert J. Kappen

Abstract: We derive novel conditions that guarantee convergence of the Sum-Product algorithm (also known as Loopy Belief Propagation or simply Belief Propagation) to a unique fixed point, irrespective of the initial messages. The computational complexity of the conditions is polynomial in the number of variables. In contrast with previously existing conditions, our results are directly applicable to arbit… ▽ More We derive novel conditions that guarantee convergence of the Sum-Product algorithm (also known as Loopy Belief Propagation or simply Belief Propagation) to a unique fixed point, irrespective of the initial messages. The computational complexity of the conditions is polynomial in the number of variables. In contrast with previously existing conditions, our results are directly applicable to arbitrary factor graphs (with discrete variables) and are shown to be valid also in the case of factors containing zeros, under some additional conditions. We compare our bounds with existing ones, numerically and, if possible, analytically. For binary variables with pairwise interactions, we derive sufficient conditions that take into account local evidence (i.e., single variable factors) and the type of pair interactions (attractive or repulsive). It is shown empirically that this bound outperforms existing bounds. △ Less

Submitted 8 May, 2007; v1 submitted 8 April, 2005; originally announced April 2005.

Comments: 15 pages, 5 figures. Major changes and new results in this revised version. Submitted to IEEE Transactions on Information Theory

ACM Class: I.2.3; F.2.1

Journal ref: IEEE Transactions on Information Theory, 53(12):4422-4437 Dec. 2007

Showing 1–16 of 16 results for author: Kappen, H J