-
From Optimization to Control: Quasi Policy Iteration
Authors:
Mohammad Amin Sharifi Kolarijani,
Peyman Mohajerin Esfahani
Abstract:
Recent control algorithms for Markov decision processes (MDPs) have been designed using an implicit analogy with well-established optimization algorithms. In this paper, we make this analogy explicit across four problem classes with a unified solution characterization. This novel framework, in turn, allows for a systematic transformation of algorithms from one domain to the other. In particular, w…
▽ More
Recent control algorithms for Markov decision processes (MDPs) have been designed using an implicit analogy with well-established optimization algorithms. In this paper, we make this analogy explicit across four problem classes with a unified solution characterization. This novel framework, in turn, allows for a systematic transformation of algorithms from one domain to the other. In particular, we identify equivalent optimization and control algorithms that have already been pointed out in the existing literature, but mostly in a scattered way. With this unifying framework in mind, we then exploit two linear structural constraints specific to MDPs for approximating the Hessian in a second-order-type algorithm from optimization, namely, Anderson mixing. This leads to a novel first-order control algorithm that modifies the standard value iteration (VI) algorithm by incorporating two new directions and adaptive step sizes. While the proposed algorithm, coined as quasi-policy iteration, has the same computational complexity as VI, it interestingly exhibits an empirical convergence behavior similar to policy iteration with a very low sensitivity to the discount factor.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Fast Approximate Dynamic Programming for Infinite-Horizon Markov Decision Processes
Authors:
M. A. S. Kolarijani,
G. F. Max,
P. Mohajerin Esfahani
Abstract:
In this study, we consider the infinite-horizon, discounted cost, optimal control of stochastic nonlinear systems with separable cost and constraints in the state and input variables. Using the linear-time Legendre transform, we propose a novel numerical scheme for implementation of the corresponding value iteration (VI) algorithm in the conjugate domain. Detailed analyses of the convergence, time…
▽ More
In this study, we consider the infinite-horizon, discounted cost, optimal control of stochastic nonlinear systems with separable cost and constraints in the state and input variables. Using the linear-time Legendre transform, we propose a novel numerical scheme for implementation of the corresponding value iteration (VI) algorithm in the conjugate domain. Detailed analyses of the convergence, time complexity, and error of the proposed algorithm are provided. In particular, with a discretization of size $X$ and $U$ for the state and input spaces, respectively, the proposed approach reduces the time complexity of each iteration in the VI algorithm from $O(XU)$ to $O(X+U)$, by replacing the minimization operation in the primal domain with a simple addition in the conjugate domain.
△ Less
Submitted 17 March, 2022; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Fast Approximate Dynamic Programming for Input-Affine Dynamics
Authors:
M. A. S. Kolarijani,
P. Mohajerin Esfahani
Abstract:
We propose two novel numerical schemes for approximate implementation of the dynamic programming~(DP) operation concerned with finite-horizon, optimal control of discrete-time systems with input-affine dynamics. The proposed algorithms involve discretization of the state and input spaces and are based on an alternative path that solves the dual problem corresponding to the DP operation. We provide…
▽ More
We propose two novel numerical schemes for approximate implementation of the dynamic programming~(DP) operation concerned with finite-horizon, optimal control of discrete-time systems with input-affine dynamics. The proposed algorithms involve discretization of the state and input spaces and are based on an alternative path that solves the dual problem corresponding to the DP operation. We provide error bounds for the proposed algorithms, along with a detailed analysis of their computational complexity. In particular, for a specific class of problems with separable data in the state and input variables, the proposed approach can reduce the typical time complexity of the DP operation from $O(XU)$ to $O (X+U)$, where $X$ and $U$ denote the size of the discrete state and input spaces, respectively. This reduction is achieved by an algorithmic transformation of the minimization in the DP operation to an addition via discrete conjugation.
△ Less
Submitted 17 March, 2022; v1 submitted 24 August, 2020;
originally announced August 2020.
-
Macroscopic Noisy Bounded Confidence Models with Distributed Radical Opinions
Authors:
M. A. S. Kolarijani,
A. V. Proskurnikov,
P. Mohajerin Esfahani
Abstract:
In this article, we study the nonlinear Fokker-Planck (FP) equation that arises as a mean-field (macroscopic) approximation of bounded confidence opinion dynamics, where opinions are influenced by environmental noises and opinions of radicals (stubborn individuals). The distribution of radical opinions serves as an infinite-dimensional exogenous input to the FP equation, visibly influencing the st…
▽ More
In this article, we study the nonlinear Fokker-Planck (FP) equation that arises as a mean-field (macroscopic) approximation of bounded confidence opinion dynamics, where opinions are influenced by environmental noises and opinions of radicals (stubborn individuals). The distribution of radical opinions serves as an infinite-dimensional exogenous input to the FP equation, visibly influencing the steady opinion profile. We establish mathematical properties of the FP equation. In particular, we (i) show the well-posedness of the dynamic equation, (ii) provide existence result accompanied by a quantitative global estimate for the corresponding stationary solution, and (iii) establish an explicit lower bound on the noise level that guarantees exponential convergence of the dynamics to stationary state. Combining the results in (ii) and (iii) readily yields the input-output stability of the system for sufficiently large noises. Next, using Fourier analysis, the structure of opinion clusters under the uniform initial distribution is examined. Specifically, two numerical schemes for identification of order-disorder transition and characterization of initial clustering behavior are provided. The results of analysis are validated through several numerical simulations of the continuum-agent model (partial differential equation) and the corresponding discrete-agent model (interacting stochastic differential equations) for a particular distribution of radicals.
△ Less
Submitted 13 January, 2020; v1 submitted 10 May, 2019;
originally announced May 2019.