Search | arXiv e-print repository

Arrow of Time in Estimation and Control: Duality Theory Beyond the Linear Gaussian Model

Abstract: Duality between estimation and control is a foundational concept in Control Theory. Most students learn about the elementary duality -- between observability and controllability -- in their first graduate course in linear systems theory. Therefore, it comes as a surprise that for a more general class of nonlinear stochastic systems (hidden Markov models or HMMs), duality is incomplete. Our objec… ▽ More Duality between estimation and control is a foundational concept in Control Theory. Most students learn about the elementary duality -- between observability and controllability -- in their first graduate course in linear systems theory. Therefore, it comes as a surprise that for a more general class of nonlinear stochastic systems (hidden Markov models or HMMs), duality is incomplete. Our objective in writing this article is two-fold: (i) To describe the difficulty in extending duality to HMMs; and (ii) To discuss its recent resolution by the authors. A key message is that the main difficulty in extending duality comes from time reversal in going from estimation to control. The reason for time reversal is explained with the aid of the familiar linear deterministic and linear Gaussian models. The explanation is used to motivate the difference between the linear and the nonlinear models. Once the difference is understood, duality for HMMs is described based on our recent work. The article also includes a comparison and discussion of the different types of duality considered in literature. △ Less

Submitted 27 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.01127 [pdf, other]

Backward Map for Filter Stability Analysis

Authors: ** Won Kim, Anant A. Joshi, Prashant G. Mehta

Abstract: In this paper, a backward map is introduced for the purposes of analysis of the nonlinear (stochastic) filter stability. The backward map is important because the filter-stability in the sense of $\chisq$-divergence follows from showing a certain variance decay property for the backward map. To show this property requires additional assumptions on the model properties of the hidden Markov model (H… ▽ More In this paper, a backward map is introduced for the purposes of analysis of the nonlinear (stochastic) filter stability. The backward map is important because the filter-stability in the sense of $\chisq$-divergence follows from showing a certain variance decay property for the backward map. To show this property requires additional assumptions on the model properties of the hidden Markov model (HMM). The analysis in this paper is based on introducing a Poincaré Inequality (PI) for HMMs with white noise observations. In finite state-space settings, PI is related to both the ergodicity of the Markov process as well as the observability of the HMM. It is shown that the Poincaré constant is positive if and only if the HMM is detectable. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2305.12850

arXiv:2404.15779 [pdf, ps, other]

Divergence metrics in the study of Markov and hidden Markov processes

Authors: ** Won Kim, Amirhossein Taghvaei, Prashant G. Mehta

Abstract: This paper is in two parts. In the first part of this paper, formulae for f-divergence for a continuous Markov process are reviewed. Applications of these formulae are described for the problems of stochastic stability, second law of thermodynamics, and non-equilibrium extensions thereof. The first part sets the stage for considering the f-divergence for hidden Markov processes which is the focus… ▽ More This paper is in two parts. In the first part of this paper, formulae for f-divergence for a continuous Markov process are reviewed. Applications of these formulae are described for the problems of stochastic stability, second law of thermodynamics, and non-equilibrium extensions thereof. The first part sets the stage for considering the f-divergence for hidden Markov processes which is the focus of the second part. For hidden Markov processes, again based on the use of f-divergence, analyses of filter stability and stochastic thermodynamics are described. The latter is used to illustrate the Maxwell demon for an over-damped Langevin model with white noise type observations. The expository nature of the paper combined with consistent formalism for both Markov and hidden Markov processes is written to be useful for researchers working on related themes in disparate areas. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2305.12850 [pdf, other]

doi 10.1109/TAC.2024.3413573

Variance Decay Property for Filter Stability

Authors: ** Won Kim, Prashant G. Mehta

Abstract: This paper is concerned with the problem of nonlinear (stochastic) filter stability of a hidden Markov model (HMM) with white noise observations. A contribution is the variance decay property which is used to conclude filter stability. For this purpose, a new notion of the Poincaré inequality (PI) is introduced for the nonlinear filter. PI is related to both the ergodicity of the Markov process as… ▽ More This paper is concerned with the problem of nonlinear (stochastic) filter stability of a hidden Markov model (HMM) with white noise observations. A contribution is the variance decay property which is used to conclude filter stability. For this purpose, a new notion of the Poincaré inequality (PI) is introduced for the nonlinear filter. PI is related to both the ergodicity of the Markov process as well as the observability of the HMM. The proofs are based upon a recently discovered minimum variance duality which is used to transform the nonlinear filtering problem into a stochastic optimal control problem for a backward stochastic differential equation (BSDE). △ Less

Submitted 26 June, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: 16 pages

Journal ref: IEEE Transactions on Automatic Control, 2024

arXiv:2301.00935 [pdf, other]

A Survey of Feedback Particle Filter and related Controlled Interacting Particle Systems (CIPS)

Authors: Amirhossein Taghvaei, Prashant G. Mehta

Abstract: In this survey, we describe controlled interacting particle systems (CIPS) to approximate the solution of the optimal filtering and the optimal control problems. Part I of the survey is focussed on the feedback particle filter (FPF) algorithm, its derivation based on optimal transportation theory, and its relationship to the ensemble Kalman filter (EnKF) and the conventional sequential importance… ▽ More In this survey, we describe controlled interacting particle systems (CIPS) to approximate the solution of the optimal filtering and the optimal control problems. Part I of the survey is focussed on the feedback particle filter (FPF) algorithm, its derivation based on optimal transportation theory, and its relationship to the ensemble Kalman filter (EnKF) and the conventional sequential importance sampling-resampling (SIR) particle filters. The central numerical problem of FPF -- to approximate the solution of the Poisson equation -- is described together with the main solution approaches. An analytical and numerical comparison with the SIR particle filter is given to illustrate the advantages of the CIPS approach. Part II of the survey is focussed on adapting these algorithms for the problem of reinforcement learning. The survey includes several remarks that describe extensions as well as open problems in this subject. △ Less

Submitted 20 March, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

arXiv:2208.06587 [pdf, other]

Duality for Nonlinear Filtering II: Optimal Control

Authors: ** Won Kim, Prashant G. Mehta

Abstract: This paper is concerned with the development and use of duality theory for a nonlinear filtering model with white noise observations. The main contribution of this paper is to introduce a stochastic optimal control problem as a dual to the nonlinear filtering problem. The mathematical statement of the dual relationship between the two problems is given in the form of a duality principle. The const… ▽ More This paper is concerned with the development and use of duality theory for a nonlinear filtering model with white noise observations. The main contribution of this paper is to introduce a stochastic optimal control problem as a dual to the nonlinear filtering problem. The mathematical statement of the dual relationship between the two problems is given in the form of a duality principle. The constraint for the optimal control problem is the backward stochastic differential equation (BSDE) introduced in the companion paper. The optimal control solution is obtained from an application of the maximum principle, and subsequently used to derive the equation of the nonlinear filter. The proposed duality is shown to be an exact extension of the classical Kalman-Bucy duality, and different from other types of optimal control and variational formulations given in literature. △ Less

Submitted 13 August, 2022; originally announced August 2022.

arXiv:2208.06586 [pdf, other]

Duality for Nonlinear Filtering I: Observability

Authors: ** Won Kim, Prashant G. Mehta

Abstract: This paper is concerned with the development and use of duality theory for a hidden Markov model (HMM) with white noise observations. The main contribution of this work is to introduce a backward stochastic differential equation (BSDE) as a dual control system. A key outcome is that stochastic observability (resp. detectability) of the HMM is expressed in dual terms: as controllability (resp. stab… ▽ More This paper is concerned with the development and use of duality theory for a hidden Markov model (HMM) with white noise observations. The main contribution of this work is to introduce a backward stochastic differential equation (BSDE) as a dual control system. A key outcome is that stochastic observability (resp. detectability) of the HMM is expressed in dual terms: as controllability (resp. stabilizability) of the dual control system. All aspects of controllability, namely, definition of controllable space and controllability gramian, along with their properties and explicit formulae, are discussed. The proposed duality is shown to be an exact extension of the classical duality in linear systems theory. One can then relate and compare the linear and the nonlinear systems. A side-by-side summary of this relationship is given in a tabular form (Table~II). △ Less

Submitted 13 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2207.07709

arXiv:2206.02222 [pdf, other]

How does a Rational Agent Act in an Epidemic?

Authors: S. Yagiz Olmez, Shubham Aggarwal, ** Won Kim, Erik Miehling, Tamer Başar, Matthew West, Prashant G. Mehta

Abstract: Evolution of disease in a large population is a function of the top-down policy measures from a centralized planner, as well as the self-interested decisions (to be socially active) of individual agents in a large heterogeneous population. This paper is concerned with understanding the latter based on a mean-field type optimal control model. Specifically, the model is used to investigate the role… ▽ More Evolution of disease in a large population is a function of the top-down policy measures from a centralized planner, as well as the self-interested decisions (to be socially active) of individual agents in a large heterogeneous population. This paper is concerned with understanding the latter based on a mean-field type optimal control model. Specifically, the model is used to investigate the role of partial information on an agent's decision-making, and study the impact of such decisions by a large number of agents on the spread of the virus in the population. The motivation comes from the presymptomatic and asymptomatic spread of the COVID-19 virus where an agent unwittingly spreads the virus. We show that even in a setting with fully rational agents, limited information on the viral state can result in an epidemic growth. △ Less

Submitted 5 June, 2022; originally announced June 2022.

Comments: arXiv admin note: text overlap with arXiv:2111.10422

arXiv:2111.10422 [pdf, ps, other]

Modeling Presymptomatic Spread in Epidemics via Mean-Field Games

Authors: S. Yagiz Olmez, Shubham Aggarwal, ** Won Kim, Erik Miehling, Tamer Başar, Matthew West, Prashant G. Mehta

Abstract: This paper is concerned with develo** mean-field game models for the evolution of epidemics. Specifically, an agent's decision -- to be socially active in the midst of an epidemic -- is modeled as a mean-field game with health-related costs and activity-related rewards. By considering the fully and partially observed versions of this problem, the role of information in guiding an agent's rationa… ▽ More This paper is concerned with develo** mean-field game models for the evolution of epidemics. Specifically, an agent's decision -- to be socially active in the midst of an epidemic -- is modeled as a mean-field game with health-related costs and activity-related rewards. By considering the fully and partially observed versions of this problem, the role of information in guiding an agent's rational decision is highlighted. The main contributions of the paper are to derive the equations for the mean-field game in both fully and partially observed settings of the problem, to present a complete analysis of the fully observed case, and to present some analytical results for the partially observed case. △ Less

Submitted 19 November, 2021; originally announced November 2021.

arXiv:2111.00109 [pdf, ps, other]

A Dynamic Programming Formulation for the Nonlinear Filter

Authors: ** Won Kim, Prashant G. Mehta

Abstract: This paper build on our recent work where we presented a dual stochastic optimal control formulation of the nonlinear filtering problem [1]. The constraint for the dual problem is a backward stochastic differential equations (BSDE). The solution is obtained via an application of the maximum principle (MP). In the present paper, a dynamic programming (DP) principle is presented for a special class… ▽ More This paper build on our recent work where we presented a dual stochastic optimal control formulation of the nonlinear filtering problem [1]. The constraint for the dual problem is a backward stochastic differential equations (BSDE). The solution is obtained via an application of the maximum principle (MP). In the present paper, a dynamic programming (DP) principle is presented for a special class of BSDE-constrained stochastic optimal control problems. The principle is applied to derive the solution of the nonlinear filtering problem. △ Less

Submitted 29 October, 2021; originally announced November 2021.

arXiv:2107.01244 [pdf, other]

Controlled Interacting Particle Algorithms for Simulation-based Reinforcement Learning

Authors: Anant Joshi, Amirhossein Taghvaei, Prashant G. Mehta, Sean P. Meyn

Abstract: This paper is concerned with optimal control problems for control systems in continuous time, and interacting particle system methods designed to construct approximate control solutions. Particular attention is given to the linear quadratic (LQ) control problem. There is a growing interest in re-visiting this classical problem, in part due to the successes of reinforcement learning (RL). The main… ▽ More This paper is concerned with optimal control problems for control systems in continuous time, and interacting particle system methods designed to construct approximate control solutions. Particular attention is given to the linear quadratic (LQ) control problem. There is a growing interest in re-visiting this classical problem, in part due to the successes of reinforcement learning (RL). The main question of this body of research (and also of our paper) is to approximate the optimal control law {\em without} explicitly solving the Riccati equation. A novel simulation-based algorithm, namely a dual ensemble Kalman filter (EnKF), is introduced. The algorithm is used to obtain formulae for optimal control, expressed entirely in terms of the EnKF particles. An extension to the nonlinear case is also presented. The theoretical results and algorithms are illustrated with numerical experiments. △ Less

Submitted 7 July, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

arXiv:2103.14634 [pdf, ps, other]

A Dual Characterization of the Stability of the Wonham Filter

Authors: ** Won Kim, Prashant G. Mehta

Abstract: This paper revisits the classical question of the stability of the nonlinear Wonham filter. The novel contributions of this paper are two-fold: (i) definition of the stabilizability for the (control-theoretic) dual to the nonlinear filter; and (ii) the use of this definition to obtain conclusions on the stability of the Wonham filter. Specifically, it is shown that the stabilizability of the dual… ▽ More This paper revisits the classical question of the stability of the nonlinear Wonham filter. The novel contributions of this paper are two-fold: (i) definition of the stabilizability for the (control-theoretic) dual to the nonlinear filter; and (ii) the use of this definition to obtain conclusions on the stability of the Wonham filter. Specifically, it is shown that the stabilizability of the dual system is necessary for filter stability and conversely stabilizability implies that the filter asymptotically detects the correct ergodic class. The formulation and the proofs are based upon a recently discovered duality result whereby the nonlinear filtering problem is cast as a stochastic optimal control problem for a backward stochastic differential equation (BSDE). The control-theoretic proof techniques and results may be viewed as a generalization of the classical work on the stability of the Kalman filter. △ Less

Submitted 8 October, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

Comments: 60th IEEE Conference on Decision and Control (CDC)

arXiv:2103.14631 [pdf, ps, other]

The Conditional Poincaré Inequality for Filter Stability

Authors: ** Won Kim, Prashant G. Mehta, Sean Meyn

Abstract: This paper is concerned with the problem of nonlinear filter stability of ergodic Markov processes. The main contribution is the conditional Poincaré inequality (PI), which is shown to yield filter stability. The proof is based upon a recently discovered duality which is used to transform the nonlinear filtering problem into a stochastic optimal control problem for a backward stochastic differenti… ▽ More This paper is concerned with the problem of nonlinear filter stability of ergodic Markov processes. The main contribution is the conditional Poincaré inequality (PI), which is shown to yield filter stability. The proof is based upon a recently discovered duality which is used to transform the nonlinear filtering problem into a stochastic optimal control problem for a backward stochastic differential equation (BSDE). Based on these dual formalisms, a comparison is drawn between the stochastic stability of a Markov process and the filter stability. The latter relies on the conditional PI described in this paper, whereas the former relies on the standard form of PI. △ Less

Submitted 8 October, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

Comments: 60th IEEE Conference on Decision and Control (CDC)

arXiv:2102.10712 [pdf, other]

Optimal Transportation Methods in Nonlinear Filtering: The feedback particle filter

Authors: Amirhossein Taghvaei, Prashant G. Mehta

Abstract: Feedback particle filter (FPF) is a Monte-Carlo (MC) algorithm to approximate the solution of a stochastic filtering problem. In contrast to conventional particle filters, the Bayesian update step in FPF is implemented via a mean-field type feedback control law. The objective for this paper is to situate the development of FPF and related controlled interacting particle system algorithms within… ▽ More Feedback particle filter (FPF) is a Monte-Carlo (MC) algorithm to approximate the solution of a stochastic filtering problem. In contrast to conventional particle filters, the Bayesian update step in FPF is implemented via a mean-field type feedback control law. The objective for this paper is to situate the development of FPF and related controlled interacting particle system algorithms within the framework of optimal transportation theory. Starting from the simplest setting of the Bayes' update formula, a coupling viewpoint is introduced to construct particle filters. It is shown that the conventional importance sampling resampling particle filter implements an independent coupling. Design of optimal couplings is introduced first for the simple Gaussian settings and subsequently extended to derive the FPF algorithm. The final half of the paper provides a review of some of the salient aspects of the FPF algorithm including the feedback structure, algorithms for gain function design, and comparison with conventional particle filters. The comparison serves to illustrate the benefit of feedback in particle filtering. △ Less

Submitted 21 February, 2021; originally announced February 2021.

arXiv:2101.05941 [pdf, other]

Minimum variance constrained estimator

Authors: Prabhat K. Mishra, Girish Chowdhary, Prashant G. Mehta

Abstract: This paper is concerned with the problem of state estimation for discrete-time linear systems in the presence of additional (equality or inequality) constraints on the state (or estimate). By use of the minimum variance duality, the estimation problem is converted into an optimal control problem. Two algorithmic solutions are described: the full information estimator (FIE) and the moving horizon e… ▽ More This paper is concerned with the problem of state estimation for discrete-time linear systems in the presence of additional (equality or inequality) constraints on the state (or estimate). By use of the minimum variance duality, the estimation problem is converted into an optimal control problem. Two algorithmic solutions are described: the full information estimator (FIE) and the moving horizon estimator (MHE). The main result is to show that the proposed estimator is stable in the sense of an observer. The proposed algorithm is distinct from the standard algorithm for constrained state estimation based upon the use of the minimum energy duality. The two are compared numerically on the benchmark batch reactor process model. △ Less

Submitted 7 December, 2021; v1 submitted 14 January, 2021; originally announced January 2021.

arXiv:2010.09920 [pdf, ps, other]

Optimality vs Stability Trade-off in Ensemble Kalman Filters

Authors: Amirhossein Taghvaei, Prashant G. Mehta, Tryphon T. Georgiou

Abstract: This paper is concerned with optimality and stability analysis of a family of ensemble Kalman filter (EnKF) algorithms. EnKF is commonly used as an alternative to the Kalman filter for high-dimensional problems, where storing the covariance matrix is computationally expensive. The algorithm consists of an ensemble of interacting particles driven by a feedback control law. The control law is design… ▽ More This paper is concerned with optimality and stability analysis of a family of ensemble Kalman filter (EnKF) algorithms. EnKF is commonly used as an alternative to the Kalman filter for high-dimensional problems, where storing the covariance matrix is computationally expensive. The algorithm consists of an ensemble of interacting particles driven by a feedback control law. The control law is designed such that, in the linear Gaussian setting and asymptotic limit of infinitely many particles, the mean and covariance of the particles follow the exact mean and covariance of the Kalman filter. The problem of finding a control law that is exact does not have a unique solution, reminiscent of the problem of finding a transport map between two distributions. A unique control law can be identified by introducing control cost functions, that are motivated by the optimal transportation problem or Schrödinger bridge problem. The objective of this paper is to study the relationship between optimality and long-term stability of a family of exact control laws. Remarkably, the control law that is optimal in the optimal transportation sense leads to an EnKF algorithm that is not stable. △ Less

Submitted 18 February, 2022; v1 submitted 19 October, 2020; originally announced October 2020.

arXiv:2010.06655 [pdf, other]

Feedback Particle Filter for Collective Inference

Authors: ** Won Kim, Amirhossein Taghvaei, Yongxin Chen, Prashant G. Mehta

Abstract: The purpose of this paper is to describe the feedback particle filter algorithm for problems where there are a large number ($M$) of non-interacting agents (targets) with a large number ($M$) of non-agent specific observations (measurements) that originate from these agents. In its basic form, the problem is characterized by data association uncertainty whereby the association between the observat… ▽ More The purpose of this paper is to describe the feedback particle filter algorithm for problems where there are a large number ($M$) of non-interacting agents (targets) with a large number ($M$) of non-agent specific observations (measurements) that originate from these agents. In its basic form, the problem is characterized by data association uncertainty whereby the association between the observations and agents must be deduced in addition to the agent state. In this paper, the large-$M$ limit is interpreted as a problem of collective inference. This viewpoint is used to derive the equation for the empirical distribution of the hidden agent states. A feedback particle filter (FPF) algorithm for this problem is presented and illustrated via numerical simulations. Results are presented for the Euclidean and the finite state-space cases, both in continuous-time settings. The classical FPF algorithm is shown to be the special case (with $M=1$) of these more general results. The simulations help show that the algorithm well approximates the empirical distribution of the hidden states for large $M$. △ Less

Submitted 17 February, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

Comments: 15 pages, 2 figures. Submitted to FoDA

MSC Class: 60G35; 62M20 (Primary) 94A12 (Secondary)

arXiv:2010.01226 [pdf, other]

doi 10.23919/ACC50511.2021.9483284

Optimal Control of a Soft CyberOctopus Arm

Authors: Tixian Wang, Udit Halder, Heng-Sheng Chang, Mattia Gazzola, Prashant G. Mehta

Abstract: In this paper, we use the optimal control methodology to control a flexible, elastic Cosserat rod. An inspiration comes from stereotypical movement patterns in octopus arms, which are observed in a variety of manipulation tasks, such as reaching or fetching. To help uncover the mechanisms underlying these observed morphologies, we outline an optimal control-based framework. A single octopus arm is… ▽ More In this paper, we use the optimal control methodology to control a flexible, elastic Cosserat rod. An inspiration comes from stereotypical movement patterns in octopus arms, which are observed in a variety of manipulation tasks, such as reaching or fetching. To help uncover the mechanisms underlying these observed morphologies, we outline an optimal control-based framework. A single octopus arm is modeled as a Hamiltonian control system, where the continuum mechanics of the arm is modeled after the Cosserat rod theory, and internal, distributed muscle forces and couples are considered as controls. First order necessary optimality conditions are derived for an optimal control problem formulated for this infinite dimensional system. Solutions to this problem are obtained numerically by an iterative forward-backward algorithm. The state and adjoint equations are solved in a dynamic simulation environment, setting the stage for studying a broader class of optimal control problems. Trajectories that minimize control effort are demonstrated and qualitatively compared with observed behaviors. △ Less

Submitted 1 April, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

arXiv:2010.01183 [pdf, other]

doi 10.1109/CDC42340.2020.9304260

Deep FPF: Gain function approximation in high-dimensional setting

Authors: S. Yagiz Olmez, Amirhossein Taghvaei, Prashant G. Mehta

Abstract: In this paper, we present a novel approach to approximate the gain function of the feedback particle filter (FPF). The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian. The numerical problem is to approximate the exact gain function using only finitely many particles sampled from the probability distribution. Inspired by the recent success of t… ▽ More In this paper, we present a novel approach to approximate the gain function of the feedback particle filter (FPF). The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian. The numerical problem is to approximate the exact gain function using only finitely many particles sampled from the probability distribution. Inspired by the recent success of the deep learning methods, we represent the gain function as a gradient of the output of a neural network. Thereupon considering a certain variational formulation of the Poisson equation, an optimization problem is posed for learning the weights of the neural network. A stochastic gradient algorithm is described for this purpose. The proposed approach has two significant properties/advantages: (i) The stochastic optimization algorithm allows one to process, in parallel, only a batch of samples (particles) ensuring good scaling properties with the number of particles; (ii) The remarkable representation power of neural networks means that the algorithm is potentially applicable and useful to solve high-dimensional problems. We numerically establish these two properties and provide extensive comparison to the existing approaches. △ Less

Submitted 2 October, 2020; originally announced October 2020.

Comments: To be presented at 59th IEEE Conference on Decision and Control, 2020

arXiv:2008.03559 [pdf, other]

Convex Q-Learning, Part 1: Deterministic Optimal Control

Authors: Prashant G. Mehta, Sean P. Meyn

Abstract: It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging: does the projected Bellman equation have a solution? If so, is the solution useful in the sense of generating a good policy? And, if the preceding questions are answered in the affirmative, is the algorithm consistent? These questions are unanswered even in the special case of Q-fun… ▽ More It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging: does the projected Bellman equation have a solution? If so, is the solution useful in the sense of generating a good policy? And, if the preceding questions are answered in the affirmative, is the algorithm consistent? These questions are unanswered even in the special case of Q-function approximations that are linear in the parameter. The challenge seems paradoxical, given the long history of convex analytic approaches to dynamic programming. The paper begins with a brief survey of linear programming approaches to optimal control, leading to a particular over parameterization that lends itself to applications in reinforcement learning. The main conclusions are summarized as follows: (i) The new class of convex Q-learning algorithms is introduced based on the convex relaxation of the Bellman equation. Convergence is established under general conditions, including a linear function approximation for the Q-function. (ii) A batch implementation appears similar to the famed DQN algorithm (one engine behind AlphaZero). It is shown that in fact the algorithms are very different: while convex Q-learning solves a convex program that approximates the Bellman equation, theory for DQN is no stronger than for Watkins' algorithm with function approximation: (a) it is shown that both seek solutions to the same fixed point equation, and (b) the ODE approximations for the two algorithms coincide, and little is known about the stability of this ODE. These results are obtained for deterministic nonlinear systems with total cost criterion. Many extensions are proposed, including kernel implementation, and extension to MDP models. △ Less

Submitted 8 August, 2020; originally announced August 2020.

Comments: This pre-print is written in a tutorial style so it is accessible to new-comers. It will be a part of a handout for upcoming short courses on RL. A more compact version suitable for journal submission is in preparation

MSC Class: 68T05 (Primary) 93E35; 49L20 (Secondary)

arXiv:2005.08145 [pdf, ps, other]

On the Lyapunov Foster criterion and Poincaré inequality for Reversible Markov Chains

Authors: Amirhossein Taghvaei, Prashant G. Mehta

Abstract: This paper presents an elementary proof of stochastic stability of a discrete-time reversible Markov chain starting from a Foster-Lyapunov drift condition. Besides its relative simplicity, there are two salient features of the proof: (i) it relies entirely on functional-analytic non-probabilistic arguments; and (ii) it makes explicit the connection between a Foster-Lyapunov function and Poincaré i… ▽ More This paper presents an elementary proof of stochastic stability of a discrete-time reversible Markov chain starting from a Foster-Lyapunov drift condition. Besides its relative simplicity, there are two salient features of the proof: (i) it relies entirely on functional-analytic non-probabilistic arguments; and (ii) it makes explicit the connection between a Foster-Lyapunov function and Poincaré inequality. The proof is used to derive an explicit bound for the spectral gap. An extension to the non-reversible case is also presented. △ Less

Submitted 16 May, 2020; originally announced May 2020.

arXiv:1909.12890 [pdf, ps, other]

A Dual Characterization of Observability for Stochastic Systems

Authors: ** W. Kim, Prashant G. Mehta

Abstract: This paper is concerned with a characterization of the observability for a continuous-time hidden Markov model where the state evolves as a general continuous-time Markov process and the observation process is modeled as nonlinear function of the state corrupted by the Gaussian measurement noise. The main technical tool is based on the recently discovered duality relationship between minimum varia… ▽ More This paper is concerned with a characterization of the observability for a continuous-time hidden Markov model where the state evolves as a general continuous-time Markov process and the observation process is modeled as nonlinear function of the state corrupted by the Gaussian measurement noise. The main technical tool is based on the recently discovered duality relationship between minimum variance estimation and stochastic optimal control: The observability is defined as a dual of the controllability for a certain backward stochastic differential equation. Based on the dual formulation, a test for observability is presented and related to literature. The proposed duality-based framework allows one to easily relate and compare the linear and the nonlinear systems. A side-by-side summary of this relationship is given in a tabular form (Table~1) △ Less

Submitted 21 February, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

Comments: 7 pages, Revised to be submitted to 2020 MTNS Conference

arXiv:1904.01710 [pdf, ps, other]

doi 10.1007/978-3-030-51264-4_12

An Optimal Control Derivation of Nonlinear Smoothing Equations

Authors: ** W. Kim, Prashant G. Mehta

Abstract: The purpose of this paper is to review and highlight some connections between the problem of nonlinear smoothing and optimal control of the Liouville equation. The latter has been an active area of recent research interest owing to work in mean-field games and optimal transportation theory. The nonlinear smoothing problem is considered here for continuous-time Markov processes. The observation pro… ▽ More The purpose of this paper is to review and highlight some connections between the problem of nonlinear smoothing and optimal control of the Liouville equation. The latter has been an active area of recent research interest owing to work in mean-field games and optimal transportation theory. The nonlinear smoothing problem is considered here for continuous-time Markov processes. The observation process is modeled as a nonlinear function of a hidden state with an additive Gaussian measurement noise. A variational formulation is described based upon the relative entropy formula introduced by Newton and Mitter. The resulting optimal control problem is formulated on the space of probability distributions. The Hamilton's equation of the optimal control are related to the Zakai equation of nonlinear smoothing via the log transformation. The overall procedure is shown to generalize the classical Mortensen's minimum energy estimator for the linear Gaussian problem. △ Less

Submitted 22 March, 2023; v1 submitted 2 April, 2019; originally announced April 2019.

Journal ref: In: Advances in Dynamics, Optimization and Computation. SON 2020. Studies in Systems, Decision and Control, vol 304. Springer, pp. 295-311 (2020)

arXiv:1903.11195 [pdf, ps, other]

What is the Lagrangian for Nonlinear Filtering?

Authors: ** W. Kim, Prashant G. Mehta, Sean P. Meyn

Abstract: Duality between estimation and optimal control is a problem of rich historical significance. The first duality principle appears in the seminal paper of Kalman-Bucy, where the problem of minimum variance estimation is shown to be dual to a linear quadratic (LQ) optimal control problem. Duality offers a constructive proof technique to derive the Kalman filter equation from the optimal control solut… ▽ More Duality between estimation and optimal control is a problem of rich historical significance. The first duality principle appears in the seminal paper of Kalman-Bucy, where the problem of minimum variance estimation is shown to be dual to a linear quadratic (LQ) optimal control problem. Duality offers a constructive proof technique to derive the Kalman filter equation from the optimal control solution. This paper generalizes the classical duality result of Kalman-Bucy to the nonlinear filter: The state evolves as a continuous-time Markov process and the observation is a nonlinear function of state corrupted by an additive Gaussian noise. A dual process is introduced as a backward stochastic differential equation (BSDE). The process is used to transform the problem of minimum variance estimation into an optimal control problem. Its solution is obtained from an application of the maximum principle, and subsequently used to derive the equation of the nonlinear filter. The classical duality result of Kalman-Bucy is shown to be a special case. △ Less

Submitted 24 October, 2019; v1 submitted 26 March, 2019; originally announced March 2019.

Comments: 8 pages, 58th IEEE Conference on Decision and Control (Dec. 2019)

arXiv:1902.07263 [pdf, other]

Diffusion map-based algorithm for Gain function approximation in the Feedback Particle Filter

Authors: Amirhossein Taghvaei, Prashant G. Mehta, Sean P. Meyn

Abstract: Feedback particle filter (FPF) is a numerical algorithm to approximate the solution of the nonlinear filtering problem in continuous-time settings. In any numerical implementation of the FPF algorithm, the main challenge is to numerically approximate the so-called gain function. A numerical algorithm for gain function approximation is the subject of this paper. The exact gain function is the solut… ▽ More Feedback particle filter (FPF) is a numerical algorithm to approximate the solution of the nonlinear filtering problem in continuous-time settings. In any numerical implementation of the FPF algorithm, the main challenge is to numerically approximate the so-called gain function. A numerical algorithm for gain function approximation is the subject of this paper. The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian $Δ_ρ$. The numerical problem is to approximate this solution using {\em only} finitely many particles sampled from the probability distribution $ρ$. A diffusion map-based algorithm was proposed by the authors in a prior work to solve this problem. The algorithm is named as such because it involves, as an intermediate step, a diffusion map approximation of the exact semigroup $e^{Δ_ρ}$. The original contribution of this paper is to carry out a rigorous error analysis of the diffusion map-based algorithm. The error is shown to include two components: bias and variance. The bias results from the diffusion map approximation of the exact semigroup. The variance arises because of finite sample size. Scalings and upper bounds are derived for bias and variance. These bounds are then illustrated with numerical experiments that serve to emphasize the effects of problem dimension and sample size. The proposed algorithm is applied to two filtering examples and comparisons provided with the sequential importance resampling (SIR) particle filter. △ Less

Submitted 30 September, 2019; v1 submitted 19 February, 2019; originally announced February 2019.

arXiv:1901.03317 [pdf, other]

Accelerated Flow for Probability Distributions

Authors: Amirhossein Taghvaei, Prashant G. Mehta

Abstract: This paper presents a methodology and numerical algorithms for constructing accelerated gradient flows on the space of probability distributions. In particular, we extend the recent variational formulation of accelerated gradient methods in (wibisono, et. al. 2016) from vector valued variables to probability distributions. The variational problem is modeled as a mean-field optimal control problem.… ▽ More This paper presents a methodology and numerical algorithms for constructing accelerated gradient flows on the space of probability distributions. In particular, we extend the recent variational formulation of accelerated gradient methods in (wibisono, et. al. 2016) from vector valued variables to probability distributions. The variational problem is modeled as a mean-field optimal control problem. The maximum principle of optimal control theory is used to derive Hamilton's equations for the optimal gradient flow. The Hamilton's equation are shown to achieve the accelerated form of density transport from any initial probability distribution to a target probability distribution. A quantitative estimate on the asymptotic convergence rate is provided based on a Lyapunov function construction, when the objective functional is displacement convex. Two numerical approximations are presented to implement the Hamilton's equations as a system of $N$ interacting particles. The continuous limit of the Nesterov's algorithm is shown to be a special case with $N=1$. The algorithm is illustrated with numerical examples. △ Less

Submitted 10 January, 2019; v1 submitted 10 January, 2019; originally announced January 2019.

arXiv:1809.10762 [pdf, ps, other]

An Approach to Duality in Nonlinear Filtering

Authors: ** W. Kim, Amirhossein Taghvaei, Prashant G. Mehta, Sean P. Meyn

Abstract: This paper revisits the question of duality between minimum variance estimation and optimal control first described for the linear Gaussian case in the celebrated paper of Kalman and Bucy. A duality result is established for nonlinear filtering, mirroring closely the original Kalman-Bucy duality of control and estimation for linear systems. The result for the finite state-space continuous time Mar… ▽ More This paper revisits the question of duality between minimum variance estimation and optimal control first described for the linear Gaussian case in the celebrated paper of Kalman and Bucy. A duality result is established for nonlinear filtering, mirroring closely the original Kalman-Bucy duality of control and estimation for linear systems. The result for the finite state-space continuous time Markov chain is presented. It's solution is used to derive the classical Wonham filter. △ Less

Submitted 26 March, 2019; v1 submitted 27 September, 2018; originally announced September 2018.

Comments: 6 pages, 2019 American Control Conference

arXiv:1809.07892 [pdf, ps, other]

Error Analysis of the Stochastic Linear Feedback Particle Filter

Authors: Amirhossein Taghvaei, Prashant G. Mehta

Abstract: This paper is concerned with the convergence and long-term stability analysis of the feedback particle filter (FPF) algorithm. The FPF is an interacting system of $N$ particles where the interaction is designed such that the empirical distribution of the particles approximates the posterior distribution. It is known that in the mean-field limit ($N=\infty$), the distribution of the particles is eq… ▽ More This paper is concerned with the convergence and long-term stability analysis of the feedback particle filter (FPF) algorithm. The FPF is an interacting system of $N$ particles where the interaction is designed such that the empirical distribution of the particles approximates the posterior distribution. It is known that in the mean-field limit ($N=\infty$), the distribution of the particles is equal to the posterior distribution. However little is known about the convergence to the mean-field limit. In this paper, we consider the FPF algorithm for the linear Gaussian setting. In this setting, the algorithm is similar to the ensemble Kalman-Bucy filter algorithm. Although these algorithms have been numerically evaluated and widely used in applications, their convergence and long-term stability analysis remains an active area of research. In this paper, we show that, (i) the mean-field limit is well-defined with a unique strong solution; (ii) the mean-field process is stable with respect to the initial condition; (iii) we provide conditions such that the finite-$N$ system is long term stable and we obtain some mean-squared error estimates that are uniform in time. △ Less

Submitted 20 September, 2018; originally announced September 2018.

arXiv:1804.04199 [pdf, ps, other]

Derivation and Extensions of the Linear Feedback Particle Filter based on Duality Formalisms

Authors: ** W. Kim, Amirhossein Taghvaei, Prashant G. Mehta

Abstract: This paper is concerned with a duality-based approach to derive the linear feedback particle filter (FPF). The FPF is a controlled interacting particle system where the control law is designed to provide an exact solution for the nonlinear filtering problem. For the linear Gaussian special case, certain simplifications arise whereby the linear FPF is identical to the square-root form of the ensemb… ▽ More This paper is concerned with a duality-based approach to derive the linear feedback particle filter (FPF). The FPF is a controlled interacting particle system where the control law is designed to provide an exact solution for the nonlinear filtering problem. For the linear Gaussian special case, certain simplifications arise whereby the linear FPF is identical to the square-root form of the ensemble Kalman filter. For this and for the more general nonlinear non-Gaussian case, it has been an open problem to derive/interpret the FPF control law as a solution of an optimal control problem. In this paper, certain duality-based arguments are employed to transform the filtering problem into an optimal control problem. Its solution is shown to yield the deterministic form of the linear FPF. An extension is described to incorporate stochastic effects due to noise leading to a novel homotopy of exact ensemble Kalman filters. All the derivations are based on duality formalisms. △ Less

Submitted 11 April, 2018; originally announced April 2018.

arXiv:1710.11008 [pdf, other]

Error Analysis for the Linear Feedback Particle Filter

Authors: Amirhossein Taghvaei, Prashant G. Mehta

Abstract: This paper is concerned with the convergence and the error analysis for the feedback particle filter (FPF) algorithm. The FPF is a controlled interacting particle system where the control law is designed to solve the nonlinear filtering problem. For the linear Gaussian case, certain simplifications arise whereby the linear FPF reduces to one form of the ensemble Kalman filter. For this and for the… ▽ More This paper is concerned with the convergence and the error analysis for the feedback particle filter (FPF) algorithm. The FPF is a controlled interacting particle system where the control law is designed to solve the nonlinear filtering problem. For the linear Gaussian case, certain simplifications arise whereby the linear FPF reduces to one form of the ensemble Kalman filter. For this and for the more general nonlinear non-Gaussian case, it has been an open problem to relate the convergence and error properties of the finite-N algorithm to the mean-field limit (where the exactness results have been obtained). In this paper, the equations for empirical mean and covariance are derived for the finite-N linear FPF. Remarkably, for a certain deterministic form of FPF, the equations for mean and variance are identical to the Kalman filter. This allows strong conclusions on convergence and error properties based on the classical filter stability theory for the Kalman filter. It is shown that the error converges to zero even with finite number of particles. The paper also presents propagation of chaos estimates for the finite-N linear filter. The error estimates are illustrated with numerical experiments. △ Less

Submitted 30 October, 2017; originally announced October 2017.

arXiv:1709.09625 [pdf, other]

How regularization affects the critical points in linear networks

Authors: Amirhossein Taghvaei, ** W. Kim, Prashant G. Mehta

Abstract: This paper is concerned with the problem of representing and learning a linear transformation using a linear neural network. In recent years, there has been a growing interest in the study of such networks in part due to the successes of deep learning. The main question of this body of research and also of this paper pertains to the existence and optimality properties of the critical points of the… ▽ More This paper is concerned with the problem of representing and learning a linear transformation using a linear neural network. In recent years, there has been a growing interest in the study of such networks in part due to the successes of deep learning. The main question of this body of research and also of this paper pertains to the existence and optimality properties of the critical points of the mean-squared loss function. The primary concern here is the robustness of the critical points with regularization of the loss function. An optimal control model is introduced for this purpose and a learning algorithm (regularized form of backprop) derived for the same using the Hamilton's formulation of optimal control. The formulation is used to provide a complete characterization of the critical points in terms of the solutions of a nonlinear matrix-valued equation, referred to as the characteristic equation. Analytical and numerical tools from bifurcation theory are used to compute the critical points via the solutions of the characteristic equation. The main conclusion is that the critical point diagram can be fundamentally different even with arbitrary small amounts of regularization. △ Less

Submitted 27 September, 2017; originally announced September 2017.

arXiv:1702.07241 [pdf, ps, other]

Kalman Filter and its Modern Extensions for the Continuous-time Nonlinear Filtering Problem

Authors: Amirhossein Taghvaei, Jana de Wiljes, Prashant G. Mehta, Sebastian Reich

Abstract: This paper is concerned with the filtering problem in continuous-time. Three algorithmic solution approaches for this problem are reviewed: (i) the classical Kalman-Bucy filter which provides an exact solution for the linear Gaussian problem, (ii) the ensemble Kalman-Bucy filter (EnKBF) which is an approximate filter and represents an extension of the Kalman-Bucy filter to nonlinear problems, and… ▽ More This paper is concerned with the filtering problem in continuous-time. Three algorithmic solution approaches for this problem are reviewed: (i) the classical Kalman-Bucy filter which provides an exact solution for the linear Gaussian problem, (ii) the ensemble Kalman-Bucy filter (EnKBF) which is an approximate filter and represents an extension of the Kalman-Bucy filter to nonlinear problems, and (iii) the feedback particle filter (FPF) which represents an extension of the EnKBF and furthermore provides for an consistent solution in the general nonlinear, non-Gaussian case. The common feature of the three algorithms is the gain times error formula to implement the update step (to account for conditioning due to the observations) in the filter. In contrast to the commonly used sequential Monte Carlo methods, the EnKBF and FPF avoid the resampling of the particles in the importance sampling update step. Moreover, the feedback control structure provides for error correction potentially leading to smaller simulation variance and improved stability properties. The paper also discusses the issue of non-uniqueness of the filter update formula and formulates a novel approximation algorithm based on ideas from optimal transport and coupling of measures. Performance of this and other algorithms is illustrated for a numerical example. △ Less

Submitted 21 December, 2017; v1 submitted 21 February, 2017; originally announced February 2017.

arXiv:1701.02416 [pdf, other]

Feedback Particle Filter on Matrix Lie Groups

Authors: Chi Zhang, Amirhossein Taghvaei, Prashant G. Mehta

Abstract: This paper is concerned with the problem of continuous-time nonlinear filtering for stochastic processes on a connected matrix Lie group. The main contribution of this paper is to derive the feedback particle filter (FPF) algorithm for this problem. In its general form, the FPF is shown to provide a coordinate-free description of the filter that automatically satisfies the geometric constraints of… ▽ More This paper is concerned with the problem of continuous-time nonlinear filtering for stochastic processes on a connected matrix Lie group. The main contribution of this paper is to derive the feedback particle filter (FPF) algorithm for this problem. In its general form, the FPF is shown to provide a coordinate-free description of the filter that automatically satisfies the geometric constraints of the manifold. The particle dynamics are encapsulated in a Stratonovich stochastic differential equation that retains the feedback structure of the original (Euclidean) FPF. The implementation of the filter requires a solution of a Poisson equation on the Lie group, and two numerical algorithms are described for this purpose. As an example, the FPF is applied to the problem of attitude estimation - a nonlinear filtering problem on the Lie group SO(3). The formulae of the filter are described using both the rotation matrix and the quaternion coordinates. Comparisons are also provided between the FPF and some popular algorithms for attitude estimation, namely the multiplicative EKF, the unscented quaternion estimator, the left invariant EKF, and the invariant ensemble Kalman filter. Numerical simulations are presented to illustrate the comparisons. △ Less

Submitted 9 January, 2017; originally announced January 2017.

Comments: 33 pages

arXiv:1701.02413 [pdf, other]

A Controlled Particle Filter for Global Optimization

Authors: Chi Zhang, Amirhossein Taghvaei, Prashant G. Mehta

Abstract: A particle filter is introduced to numerically approximate a solution of the global optimization problem. The theoretical significance of this work comes from its variational aspects: (i) the proposed particle filter is a controlled interacting particle system where the control input represents the solution of a mean-field type optimal control problem; and (ii) the associated density transport is… ▽ More A particle filter is introduced to numerically approximate a solution of the global optimization problem. The theoretical significance of this work comes from its variational aspects: (i) the proposed particle filter is a controlled interacting particle system where the control input represents the solution of a mean-field type optimal control problem; and (ii) the associated density transport is shown to be a gradient flow (steepest descent) for the optimal value function, with respect to the Kullback--Leibler divergence. The optimal control construction of the particle filter is a significant departure from the classical importance sampling-resampling based approaches. There are several practical advantages: (i) resampling, reproduction, death or birth of particles is avoided; (ii) simulation variance can potentially be reduced by applying feedback control principles; and (iii) the parametric approximation naturally arises as a special case. The latter also suggests systematic approaches for numerical approximation of the optimal control law. The theoretical results are illustrated with numerical examples. △ Less

Submitted 9 January, 2017; originally announced January 2017.

Comments: 33 pages

arXiv:1612.05606 [pdf, other]

Error Estimates for the Kernel Gain Function Approximation in the Feedback Particle Filter

Authors: Amirhossein Taghvaei, Prashant G. Mehta, Sean P. Meyn

Abstract: This paper is concerned with the analysis of the kernel-based algorithm for gain function approximation in the feedback particle filter. The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian. The kernel-based method -- introduced in our prior work -- allows one to approximate this solution using {\em only} particles sampled from the probability di… ▽ More This paper is concerned with the analysis of the kernel-based algorithm for gain function approximation in the feedback particle filter. The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian. The kernel-based method -- introduced in our prior work -- allows one to approximate this solution using {\em only} particles sampled from the probability distribution. This paper describes new representations and algorithms based on the kernel-based method. Theory surrounding the approximation is improved and a novel formula for the gain function approximation is derived. A procedure for carrying out error analysis of the approximation is introduced. Certain asymptotic estimates for bias and variance are derived for the general nonlinear non-Gaussian case. Comparison with the constant gain function approximation is provided. The results are illustrated with the aid of some numerical experiments. △ Less

Submitted 16 December, 2016; originally announced December 2016.

arXiv:1604.01371 [pdf, other]

Attitude Estimation with Feedback Particle Filter

Authors: Chi Zhang, Amirhossein Taghvaei, Prashant G. Mehta

Abstract: This paper presents theory, application, and comparisons of the feedback particle filter (FPF) algorithm for the problem of attitude estimation. The paper builds upon our recent work on the exact FPF solution of the continuous-time nonlinear filtering problem on compact Lie groups. In this paper, the details of the FPF algorithm are presented for the problem of attitude estimation - a nonlinear fi… ▽ More This paper presents theory, application, and comparisons of the feedback particle filter (FPF) algorithm for the problem of attitude estimation. The paper builds upon our recent work on the exact FPF solution of the continuous-time nonlinear filtering problem on compact Lie groups. In this paper, the details of the FPF algorithm are presented for the problem of attitude estimation - a nonlinear filtering problem on SO(3). The quaternions are employed for computational purposes. The algorithm requires a numerical solution of the filter gain function, and two methods are applied for this purpose. Comparisons are also provided between the FPF and some popular algorithms for attitude estimation on SO(3), including the invariant EKF, the multiplicative EKF, and the unscented Kalman filter. Simulation results are presented that help illustrate the comparisons. △ Less

Submitted 5 April, 2016; originally announced April 2016.

Comments: 8 pages, 2 figures

arXiv:1603.05496 [pdf, other]

Gain Function Approximation in the Feedback Particle Filter

Authors: Amirhossein Taghvaei, Prashant G. Mehta

Abstract: This paper is concerned with numerical algorithms for gain function approximation in the feedback particle filter. The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian. The problem is to approximate this solution using only particles sampled from the probability distribution. Two algorithms are presented: a Galerkin algorithm and a kernel-based a… ▽ More This paper is concerned with numerical algorithms for gain function approximation in the feedback particle filter. The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian. The problem is to approximate this solution using only particles sampled from the probability distribution. Two algorithms are presented: a Galerkin algorithm and a kernel-based algorithm. Both the algorithms are adapted to the samples and do not require approximation of the probability distribution as an intermediate step. The paper contains error analysis for the algorithms as well as some comparative numerical results for a non-Gaussian distribution. These algorithms are also applied and illustrated for a simple nonlinear filtering example. △ Less

Submitted 17 March, 2016; originally announced March 2016.

arXiv:1510.01948 [pdf, other]

An Optimal Transport Formulation of the Linear Feedback Particle Filter

Authors: Amirhossein Taghvaei, Prashant G. Mehta

Abstract: Feedback particle filter (FPF) is an algorithm to numerically approximate the solution of the nonlinear filtering problem in continuous time. The algorithm implements a feedback control law for a system of particles such that the empirical distribution of particles approximates the posterior distribution. However, it has been noted in the literature that the feedback control law is not unique. To… ▽ More Feedback particle filter (FPF) is an algorithm to numerically approximate the solution of the nonlinear filtering problem in continuous time. The algorithm implements a feedback control law for a system of particles such that the empirical distribution of particles approximates the posterior distribution. However, it has been noted in the literature that the feedback control law is not unique. To find a unique control law, the filtering task is formulated here as an optimal transportation problem between the prior and the posterior distributions. Based on this formulation, a time step** optimization procedure is proposed for the optimal control design. A key difference between the optimal control law and the one in the original FPF, is the replacement of noise term with a deterministic term. This difference serves to decreases the simulation variance, as illustrated with a simple numerical example. △ Less

Submitted 7 October, 2015; originally announced October 2015.

arXiv:1510.01259 [pdf, ps, other]

Feedback Particle Filter on Matrix Lie Groups

Authors: Chi Zhang, Amirhossein Taghvaei, Prashant G. Mehta

Abstract: This paper is concerned with the problem of continuous-time nonlinear filtering for stochastic processes on a compact and connected matrix Lie group without boundary, e.g. SO(n) and SE(n), in the presence of real-valued observations. This problem is important to numerous applications in attitude estimation, visual tracking and robotic localization. The main contribution of this paper is to derive… ▽ More This paper is concerned with the problem of continuous-time nonlinear filtering for stochastic processes on a compact and connected matrix Lie group without boundary, e.g. SO(n) and SE(n), in the presence of real-valued observations. This problem is important to numerous applications in attitude estimation, visual tracking and robotic localization. The main contribution of this paper is to derive the feedback particle filter (FPF) algorithm for this problem. In its general form, the FPF provides a coordinate-free description of the filter that furthermore satisfies the geometric constraints of the manifold. The particle dynamics are encapsulated in a Stratonovich stochastic differential equation that preserves the feedback structure of the original Euclidean FPF. Specific examples for SO(2) and SO(3) are provided to help illustrate the filter using the phase and the quaternion coordinates, respectively. △ Less

Submitted 5 October, 2015; originally announced October 2015.

Comments: 7 pages, submitted for 2016 American Control Conference

arXiv:1412.5845 [pdf, ps, other]

Poisson's equation in nonlinear filtering

Authors: Richard S. Laugesen, Prashant G. Mehta, Sean P. Meyn, Maxim Raginsky

Abstract: The aim of this paper is to provide a variational interpretation of the nonlinear filter in continuous time. A time-step** procedure is introduced, consisting of successive minimization problems in the space of probability densities. The weak form of the nonlinear filter is derived via analysis of the first-order optimality conditions for these problems. The derivation shows the nonlinear filter… ▽ More The aim of this paper is to provide a variational interpretation of the nonlinear filter in continuous time. A time-step** procedure is introduced, consisting of successive minimization problems in the space of probability densities. The weak form of the nonlinear filter is derived via analysis of the first-order optimality conditions for these problems. The derivation shows the nonlinear filter dynamics may be regarded as a gradient flow, or a steepest descent, for a certain energy functional with respect to the Kullback-Leibler divergence. The second part of the paper is concerned with derivation of the feedback particle filter algorithm, based again on the analysis of the first variation. The algorithm is shown to be exact. That is, the posterior distribution of the particle matches exactly the true posterior, provided the filter is initialized with the true prior. △ Less

Submitted 18 December, 2014; originally announced December 2014.

Comments: 25 pages; accepted to SIAM Journal on Control and Optimization

arXiv:1404.4386 [pdf, other]

Probabilistic Data Association-Feedback Particle Filter for Multiple Target Tracking Applications

Authors: Tao Yang, Prashant G. Mehta

Abstract: This paper is concerned with the problem of tracking single or multiple targets with multiple non-target specific observations (measurements). For such filtering problems with data association uncertainty, a novel feedback control-based particle filter algorithm is introduced. The algorithm is referred to as the probabilistic data association-feedback particle filter (PDA-FPF). The proposed filter… ▽ More This paper is concerned with the problem of tracking single or multiple targets with multiple non-target specific observations (measurements). For such filtering problems with data association uncertainty, a novel feedback control-based particle filter algorithm is introduced. The algorithm is referred to as the probabilistic data association-feedback particle filter (PDA-FPF). The proposed filter is shown to represent a generalization to the nonlinear non-Gaussian case of the classical Kalman filter-based probabilistic data association filter (PDAF). One remarkable conclusion is that the proposed PDA-FPF algorithm retains the innovation error-based feedback structure of the classical PDAF algorithm, even in the nonlinear non-Gaussian case. The theoretical results are illustrated with the aid of numerical examples motivated by multiple target tracking applications. △ Less

Submitted 16 April, 2014; originally announced April 2014.

arXiv:1305.5977 [pdf, other]

Interacting Multiple Model-Feedback Particle Filter for Stochastic Hybrid Systems

Authors: Tao Yang, Henk A. P. Blom, Prashant G. Mehta

Abstract: In this paper, a novel feedback control-based particle filter algorithm for the continuous-time stochastic hybrid system estimation problem is presented. This particle filter is referred to as the interacting multiple model-feedback particle filter (IMM-FPF), and is based on the recently developed feedback particle filter. The IMM-FPF is comprised of a series of parallel FPFs, one for each discret… ▽ More In this paper, a novel feedback control-based particle filter algorithm for the continuous-time stochastic hybrid system estimation problem is presented. This particle filter is referred to as the interacting multiple model-feedback particle filter (IMM-FPF), and is based on the recently developed feedback particle filter. The IMM-FPF is comprised of a series of parallel FPFs, one for each discrete mode, and an exact filter recursion for the mode association probability. The proposed IMM-FPF represents a generalization of the Kalmanfilter based IMM algorithm to the general nonlinear filtering problem. The remarkable conclusion of this paper is that the IMM-FPF algorithm retains the innovation error-based feedback structure even for the nonlinear problem. The interaction/merging process is also handled via a control-based approach. The theoretical results are illustrated with the aid of a numerical example problem for a maneuvering target tracking application. △ Less

Submitted 25 May, 2013; originally announced May 2013.

arXiv:1303.1214 [pdf, ps, other]

Joint Probabilistic Data Association-Feedback Particle Filter for Multiple Target Tracking Applications

Authors: Tao Yang, Geng Huang, Prashant G. Mehta

Abstract: This paper introduces a novel feedback-control based particle filter for the solution of the filtering problem with data association uncertainty. The particle filter is referred to as the joint probabilistic data association-feedback particle filter (JPDA-FPF). The JPDA-FPF is based on the feedback particle filter introduced in our earlier papers. The remarkable conclusion of our paper is that the… ▽ More This paper introduces a novel feedback-control based particle filter for the solution of the filtering problem with data association uncertainty. The particle filter is referred to as the joint probabilistic data association-feedback particle filter (JPDA-FPF). The JPDA-FPF is based on the feedback particle filter introduced in our earlier papers. The remarkable conclusion of our paper is that the JPDA-FPF algorithm retains the innovation error-based feedback structure of the feedback particle filter, even with data association uncertainty in the general nonlinear case. The theoretical results are illustrated with the aid of two numerical example problems drawn from multiple target tracking applications. △ Less

Submitted 5 March, 2013; originally announced March 2013.

Comments: In Proc. of the 2012 American Control Conference

arXiv:1303.1205 [pdf, other]

doi 10.1109/CDC.2012.6425937

Multivariable Feedback Particle Filter

Authors: Tao Yang, Richard S. Laugesen, Prashant G. Mehta, Sean P. Meyn

Abstract: In recent work it is shown that importance sampling can be avoided in the particle filter through an innovation structure inspired by traditional nonlinear filtering combined with Mean-Field Game formalisms. The resulting feedback particle filter (FPF) offers significant variance improvements; in particular, the algorithm can be applied to systems that are not stable. The filter comes with an up-f… ▽ More In recent work it is shown that importance sampling can be avoided in the particle filter through an innovation structure inspired by traditional nonlinear filtering combined with Mean-Field Game formalisms. The resulting feedback particle filter (FPF) offers significant variance improvements; in particular, the algorithm can be applied to systems that are not stable. The filter comes with an up-front computational cost to obtain the filter gain. This paper describes new representations and algorithms to compute the gain in the general multivariable setting. The main contributions are, (i) Theory surrounding the FPF is improved: Consistency is established in the multivariate setting, as well as well-posedness of the associated PDE to obtain the filter gain. (ii) The gain can be expressed as the gradient of a function, which is precisely the solution to Poisson's equation for a related MCMC diffusion (the Smoluchowski equation). This provides a bridge to MCMC as well as to approximate optimal filtering approaches such as TD-learning, which can in turn be used to approximate the gain. (iii) Motivated by a weak formulation of Poisson's equation, a Galerkin finite-element algorithm is proposed for approximation of the gain. Its performance is illustrated in numerical experiments. △ Less

Submitted 5 March, 2013; originally announced March 2013.

Comments: In Proc. of 51st IEEE Conference on Decision and Control

arXiv:1302.6563 [pdf, other]

Feedback Particle Filter

Authors: Tao Yang, Prashant G. Mehta, Sean P. Meyn

Abstract: A new formulation of the particle filter for nonlinear filtering is presented, based on concepts from optimal control, and from the mean-field game theory. The optimal control is chosen so that the posterior distribution of a particle matches as closely as possible the posterior distribution of the true state given the observations. This is achieved by introducing a cost function, defined by the K… ▽ More A new formulation of the particle filter for nonlinear filtering is presented, based on concepts from optimal control, and from the mean-field game theory. The optimal control is chosen so that the posterior distribution of a particle matches as closely as possible the posterior distribution of the true state given the observations. This is achieved by introducing a cost function, defined by the Kullback-Leibler (K-L) divergence between the actual posterior, and the posterior of any particle. The optimal control input is characterized by a certain Euler-Lagrange (E-L) equation, and is shown to admit an innovation error-based feedback structure. For diffusions with continuous observations, the value of the optimal control solution is ideal. The two posteriors match exactly, provided they are initialized with identical priors. The feedback particle filter is defined by a family of stochastic systems, each evolving under this optimal control law. A numerical algorithm is introduced and implemented in two general examples, and a neuroscience application involving coupled oscillators. Some preliminary numerical comparisons between the feed- back particle filter and the bootstrap particle filter are described. △ Less

Submitted 26 February, 2013; originally announced February 2013.

arXiv:1005.0351 [pdf, ps, other]

doi 10.1109/TAC.2010.2103416

Stability Margin Scaling Laws for Distributed Formation Control as a Function of Network Structure

Authors: He Hao, Prabir Barooah, Prashant G. Mehta

Abstract: We consider the problem of distributed formation control of a large number of vehicles. An individual vehicle in the formation is assumed to be a fully actuated point mass. A distributed control law is examined: the control action on an individual vehicle depends on (i) its own velocity and (ii) the relative position measurements with a small subset of vehicles (neighbors) in the formation. The ne… ▽ More We consider the problem of distributed formation control of a large number of vehicles. An individual vehicle in the formation is assumed to be a fully actuated point mass. A distributed control law is examined: the control action on an individual vehicle depends on (i) its own velocity and (ii) the relative position measurements with a small subset of vehicles (neighbors) in the formation. The neighbors are defined according to an information graph. In this paper we describe a methodology for modeling, analysis, and distributed control design of such vehicular formations whose information graph is a D-dimensional lattice. The modeling relies on an approximation based on a partial differential equation (PDE) that describes the spatio-temporal evolution of position errors in the formation. The analysis and control design is based on the PDE model. We deduce asymptotic formulae for the closed-loop stability margin (absolute value of the real part of the least stable eigenvalue) of the controlled formation. The stability margin is shown to approach 0 as the number of vehicles N goes to infinity. The exponent on the scaling law for the stability margin is influenced by the dimension and the structure of the information graph. We show that the scaling law can be improved by employing a higher dimensional information graph. Apart from analysis, the PDE model is used for a mistuning-based design of control gains to maximize the stability margin. Mistuning here refers to small perturbation of control gains from their nominal symmetric values. We show that the mistuned design can have a significantly better stability margin even with a small amount of perturbation. The results of the analysis with the PDE model are corroborated with numerical computation of eigenvalues with the state-space model of the formation. △ Less

Submitted 12 October, 2010; v1 submitted 3 May, 2010; originally announced May 2010.

Comments: This paper is the expanded version of the paper with the same name which is accepted by the IEEE Transactions on Automatic Control. The final version is updated on Oct. 12, 2010

arXiv:0812.1030 [pdf, ps, other]

doi 10.1109/TAC.2009.2026934

Mistuning-based Control Design to Improve Closed-Loop Stability of Vehicular Platoons

Authors: Prabir Barooah, Prashant G. Mehta, Joao P Hespanha

Abstract: We consider a decentralized bidirectional control of a platoon of N identical vehicles moving in a straight line. The control objective is for each vehicle to maintain a constant velocity and inter-vehicular separation using only the local information from itself and its two nearest neighbors. Each vehicle is modeled as a double integrator. To aid the analysis, we use continuous approximation to… ▽ More We consider a decentralized bidirectional control of a platoon of N identical vehicles moving in a straight line. The control objective is for each vehicle to maintain a constant velocity and inter-vehicular separation using only the local information from itself and its two nearest neighbors. Each vehicle is modeled as a double integrator. To aid the analysis, we use continuous approximation to derive a partial differential equation (PDE) approximation of the discrete platoon dynamics. The PDE model is used to explain the progressive loss of closed-loop stability with increasing number of vehicles, and to devise ways to combat this loss of stability. If every vehicle uses the same controller, we show that the least stable closed-loop eigenvalue approaches zero as O(1/N^2) in the limit of a large number (N) of vehicles. We then show how to ameliorate this loss of stability by small amounts of "mistuning", i.e., changing the controller gains from their nominal values. We prove that with arbitrary small amounts of mistuning, the asymptotic behavior of the least stable closed loop eigenvalue can be improved to O(1/N) All the conclusions drawn from analysis of the PDE model are corroborated via numerical calculations of the state-space platoon model. △ Less

Submitted 4 December, 2008; originally announced December 2008.

Comments: 14 pages, 11 figures, to appear in IEEE transactions in automatic control in 2009/2010

Showing 1–47 of 47 results for author: Mehta, P G