-
From Optimization to Control: Quasi Policy Iteration
Authors:
Mohammad Amin Sharifi Kolarijani,
Peyman Mohajerin Esfahani
Abstract:
Recent control algorithms for Markov decision processes (MDPs) have been designed using an implicit analogy with well-established optimization algorithms. In this paper, we make this analogy explicit across four problem classes with a unified solution characterization. This novel framework, in turn, allows for a systematic transformation of algorithms from one domain to the other. In particular, w…
▽ More
Recent control algorithms for Markov decision processes (MDPs) have been designed using an implicit analogy with well-established optimization algorithms. In this paper, we make this analogy explicit across four problem classes with a unified solution characterization. This novel framework, in turn, allows for a systematic transformation of algorithms from one domain to the other. In particular, we identify equivalent optimization and control algorithms that have already been pointed out in the existing literature, but mostly in a scattered way. With this unifying framework in mind, we then exploit two linear structural constraints specific to MDPs for approximating the Hessian in a second-order-type algorithm from optimization, namely, Anderson mixing. This leads to a novel first-order control algorithm that modifies the standard value iteration (VI) algorithm by incorporating two new directions and adaptive step sizes. While the proposed algorithm, coined as quasi-policy iteration, has the same computational complexity as VI, it interestingly exhibits an empirical convergence behavior similar to policy iteration with a very low sensitivity to the discount factor.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Adaptive Composite Online Optimization: Predictions in Static and Dynamic Environments
Authors:
Pedro Zattoni Scroccaro,
Arman Sharifi Kolarijani,
Peyman Mohajerin Esfahani
Abstract:
In the past few years, Online Convex Optimization (OCO) has received notable attention in the control literature thanks to its flexible real-time nature and powerful performance guarantees. In this paper, we propose new step-size rules and OCO algorithms that simultaneously exploit gradient predictions, function predictions and dynamics, features particularly pertinent to control applications. The…
▽ More
In the past few years, Online Convex Optimization (OCO) has received notable attention in the control literature thanks to its flexible real-time nature and powerful performance guarantees. In this paper, we propose new step-size rules and OCO algorithms that simultaneously exploit gradient predictions, function predictions and dynamics, features particularly pertinent to control applications. The proposed algorithms enjoy static and dynamic regret bounds in terms of the dynamics of the reference action sequence, gradient prediction error, and function prediction error, which are generalizations of known regularity measures from the literature. We present results for both convex and strongly convex costs. We validate the performance of the proposed algorithms in a trajectory tracking case study, as well as portfolio optimization using real-world datasets.
△ Less
Submitted 14 January, 2023; v1 submitted 1 May, 2022;
originally announced May 2022.
-
Multimode Diagnosis for Switched Affine Systems with Noisy Measurement
Authors:
**gwei Dong,
Arman Sharifi Kolarijani,
Peyman Mohajerin Esfahani
Abstract:
We study a diagnosis scheme to reliably detect the active mode of discrete-time, switched affine systems in the presence of measurement noise and asynchronous switching. The proposed scheme consists of two parts: (i) the construction of a bank of filters, and (ii) the introduction of a residual/threshold-based diagnosis rule. We develop an exact finite optimization-based framework to numerically s…
▽ More
We study a diagnosis scheme to reliably detect the active mode of discrete-time, switched affine systems in the presence of measurement noise and asynchronous switching. The proposed scheme consists of two parts: (i) the construction of a bank of filters, and (ii) the introduction of a residual/threshold-based diagnosis rule. We develop an exact finite optimization-based framework to numerically solve an optimal bank of filters in which the contribution of measurement noise to the residual is minimized. The design problem is safely approximated through linear matrix inequalities and thus becomes tractable. We further propose a thresholding policy along with probabilistic false-alarm guarantees to estimate the active system mode in real-time. In comparison with the existing results, the guarantees improve from a polynomial dependency in the probability of false alarm to a logarithmic form. This improvement is achieved under the additional assumption of sub-Gaussianity, which is expected in many applications. The performance of the proposed approach is validated through a numerical example and an application of the building radiant system.
△ Less
Submitted 30 December, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Fast Approximate Dynamic Programming for Infinite-Horizon Markov Decision Processes
Authors:
M. A. S. Kolarijani,
G. F. Max,
P. Mohajerin Esfahani
Abstract:
In this study, we consider the infinite-horizon, discounted cost, optimal control of stochastic nonlinear systems with separable cost and constraints in the state and input variables. Using the linear-time Legendre transform, we propose a novel numerical scheme for implementation of the corresponding value iteration (VI) algorithm in the conjugate domain. Detailed analyses of the convergence, time…
▽ More
In this study, we consider the infinite-horizon, discounted cost, optimal control of stochastic nonlinear systems with separable cost and constraints in the state and input variables. Using the linear-time Legendre transform, we propose a novel numerical scheme for implementation of the corresponding value iteration (VI) algorithm in the conjugate domain. Detailed analyses of the convergence, time complexity, and error of the proposed algorithm are provided. In particular, with a discretization of size $X$ and $U$ for the state and input spaces, respectively, the proposed approach reduces the time complexity of each iteration in the VI algorithm from $O(XU)$ to $O(X+U)$, by replacing the minimization operation in the primal domain with a simple addition in the conjugate domain.
△ Less
Submitted 17 March, 2022; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Fast Approximate Dynamic Programming for Input-Affine Dynamics
Authors:
M. A. S. Kolarijani,
P. Mohajerin Esfahani
Abstract:
We propose two novel numerical schemes for approximate implementation of the dynamic programming~(DP) operation concerned with finite-horizon, optimal control of discrete-time systems with input-affine dynamics. The proposed algorithms involve discretization of the state and input spaces and are based on an alternative path that solves the dual problem corresponding to the DP operation. We provide…
▽ More
We propose two novel numerical schemes for approximate implementation of the dynamic programming~(DP) operation concerned with finite-horizon, optimal control of discrete-time systems with input-affine dynamics. The proposed algorithms involve discretization of the state and input spaces and are based on an alternative path that solves the dual problem corresponding to the DP operation. We provide error bounds for the proposed algorithms, along with a detailed analysis of their computational complexity. In particular, for a specific class of problems with separable data in the state and input variables, the proposed approach can reduce the typical time complexity of the DP operation from $O(XU)$ to $O (X+U)$, where $X$ and $U$ denote the size of the discrete state and input spaces, respectively. This reduction is achieved by an algorithmic transformation of the minimization in the DP operation to an addition via discrete conjugation.
△ Less
Submitted 17 March, 2022; v1 submitted 24 August, 2020;
originally announced August 2020.
-
Macroscopic Noisy Bounded Confidence Models with Distributed Radical Opinions
Authors:
M. A. S. Kolarijani,
A. V. Proskurnikov,
P. Mohajerin Esfahani
Abstract:
In this article, we study the nonlinear Fokker-Planck (FP) equation that arises as a mean-field (macroscopic) approximation of bounded confidence opinion dynamics, where opinions are influenced by environmental noises and opinions of radicals (stubborn individuals). The distribution of radical opinions serves as an infinite-dimensional exogenous input to the FP equation, visibly influencing the st…
▽ More
In this article, we study the nonlinear Fokker-Planck (FP) equation that arises as a mean-field (macroscopic) approximation of bounded confidence opinion dynamics, where opinions are influenced by environmental noises and opinions of radicals (stubborn individuals). The distribution of radical opinions serves as an infinite-dimensional exogenous input to the FP equation, visibly influencing the steady opinion profile. We establish mathematical properties of the FP equation. In particular, we (i) show the well-posedness of the dynamic equation, (ii) provide existence result accompanied by a quantitative global estimate for the corresponding stationary solution, and (iii) establish an explicit lower bound on the noise level that guarantees exponential convergence of the dynamics to stationary state. Combining the results in (ii) and (iii) readily yields the input-output stability of the system for sufficiently large noises. Next, using Fourier analysis, the structure of opinion clusters under the uniform initial distribution is examined. Specifically, two numerical schemes for identification of order-disorder transition and characterization of initial clustering behavior are provided. The results of analysis are validated through several numerical simulations of the continuum-agent model (partial differential equation) and the corresponding discrete-agent model (interacting stochastic differential equations) for a particular distribution of radicals.
△ Less
Submitted 13 January, 2020; v1 submitted 10 May, 2019;
originally announced May 2019.
-
A Decentralized Event-Based Approach for Robust Model Predictive Control
Authors:
Arman Sharifi Kolarijani,
Sander Bregman,
Peyman Mohajerin Esfahani,
Tamas Keviczky
Abstract:
In this paper, we propose an event-based sampling policy to implement a constraint-tightening, robust MPC method. The proposed policy enjoys a computationally tractable design and is applicable to perturbed, linear time-invariant systems with polytopic constraints. In particular, the triggering mechanism is suitable for plants with no centralized sensory node as the triggering mechanism can be eva…
▽ More
In this paper, we propose an event-based sampling policy to implement a constraint-tightening, robust MPC method. The proposed policy enjoys a computationally tractable design and is applicable to perturbed, linear time-invariant systems with polytopic constraints. In particular, the triggering mechanism is suitable for plants with no centralized sensory node as the triggering mechanism can be evaluated locally at each individual sensor. From a geometrical viewpoint, the mechanism is a sequence of hyper-rectangles surrounding the optimal state trajectory such that robust recursive feasibility and robust stability are guaranteed. The design of the triggering mechanism is cast as a constrained parametric-in-set optimization problem with the volume of the set as the objective function. Re-parameterized in terms of the set vertices, we show that the problem admits a finite tractable convex program reformulation and a linear program relaxation. Several numerical examples are presented to demonstrate the effectiveness and limitations of the theoretical results.
△ Less
Submitted 22 September, 2019; v1 submitted 30 November, 2018;
originally announced November 2018.
-
Continuous-Time Accelerated Methods via a Hybrid Control Lens
Authors:
Arman Sharifi Kolarijani,
Peyman Mohajerin Esfahani,
Tamás Keviczky
Abstract:
Treating optimization methods as dynamical systems can be traced back centuries ago in order to comprehend the notions and behaviors of optimization methods. Lately, this mind set has become the driving force to design new optimization methods. Inspired by the recent dynamical system viewpoint of Nesterov's fast method, we propose two classes of fast methods, formulated as hybrid control systems,…
▽ More
Treating optimization methods as dynamical systems can be traced back centuries ago in order to comprehend the notions and behaviors of optimization methods. Lately, this mind set has become the driving force to design new optimization methods. Inspired by the recent dynamical system viewpoint of Nesterov's fast method, we propose two classes of fast methods, formulated as hybrid control systems, to obtain pre-specified exponential convergence rate. Alternative to the existing fast methods which are parametric-in-time second order differential equations, we dynamically synthesize feedback controls in a state-dependent manner. Namely, in the first class the dam** term is viewed as the control input, while in the second class the amplitude with which the gradient of the objective function impacts the dynamics serves as the controller. The objective function requires to satisfy the so-called Polyak--Łojasiewicz inequality which effectively implies no local optima and a certain gradient-domination property. Moreover, we establish that both hybrid structures possess Zeno-free solution trajectories. We finally provide a mechanism to determine the discretization step size to attain an exponential convergence rate.
△ Less
Submitted 23 September, 2019; v1 submitted 20 July, 2018;
originally announced July 2018.