\addbibresource

bib_files/MAP_refs1.bib \addbibresourcebib_files/MohammadRafi.bib \addbibresourcebib_files/naveed_references.bib \AtEveryBibitem\clearfieldissn \clearfielddoi \clearfieldurl \clearfieldeditor \DeclareFieldFormat*year \renewbibmacroin:

On the Feedback Law in Stochastic Optimal Nonlinear Control

Mohamed Naveed Gul Mohamed, Suman Chakravorty, Raman Goyal, and Ran Wang The authors are with the Department of Aerospace Engineering, Texas A&M University, College Station, TX 77843 USA. {naveed, schakrav, ramaniitrgoyal92, rwang0417}@tamu.edu
Abstract

We consider the problem of nonlinear stochastic optimal control. This problem is thought to be fundamentally intractable owing to Bellman’s “curse of dimensionality”. We present a result that shows that repeatedly solving an open-loop deterministic problem from the current state with progressively shorter horizons, similar to Model Predictive Control (MPC), results in a feedback policy that is O(ϵ4)𝑂superscriptitalic-ϵ4O(\epsilon^{4})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) near to the true global stochastic optimal policy, where ϵitalic-ϵ\epsilonitalic_ϵ is a perturbation parameter modulating the noise.  We show that the optimal deterministic feedback problem has a perturbation structure in that higher-order terms of the feedback law do not affect lower-order terms, and that this structure is lost in the optimal stochastic feedback problem. Consequently, solving the Stochastic Dynamic Programming problem is highly susceptible to noise, even when tractable, and in practice, the MPC-type feedback law offers superior performance even for stochastic systems.

Index Terms:
Stochastic Optimal Control, Nonlinear Systems, Model Predictive Control.

I INTRODUCTION

In this paper, we consider the problem of finite-time nonlinear stochastic optimal control, specifically the stochastic dynamical system:

dx=(f(x)+g(x)u)dt+ϵdw,𝑑𝑥𝑓𝑥𝑔𝑥𝑢𝑑𝑡italic-ϵ𝑑𝑤dx=(f(x)+g(x)u)dt+\epsilon dw,italic_d italic_x = ( italic_f ( italic_x ) + italic_g ( italic_x ) italic_u ) italic_d italic_t + italic_ϵ italic_d italic_w ,

where w𝑤witalic_w is a Wiener process, ϵitalic-ϵ\epsilonitalic_ϵ is perturbation parameter modulating the noise,  and the cost to be optimized is Jπ=𝔼[0Tc(xt,πt(xt))𝑑t+cT(xT)]superscript𝐽𝜋subscript𝔼absentdelimited-[]superscriptsubscript0𝑇𝑐subscript𝑥𝑡subscript𝜋𝑡subscript𝑥𝑡differential-d𝑡subscript𝑐𝑇subscript𝑥𝑇J^{\pi}=\mathop{\mathbb{E}}_{\begin{subarray}{c}{}\end{subarray}}\left[{\int_{% 0}^{T}c(x_{t},\pi_{t}(x_{t}))dt+c_{T}(x_{T})}\right]italic_J start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT = blackboard_E start_POSTSUBSCRIPT start_ARG end_ARG end_POSTSUBSCRIPT [ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_c ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) italic_d italic_t + italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ], where the incremental cost has the form c(x,u)=l(x)+12u𝖳Ru𝑐𝑥𝑢𝑙𝑥12superscript𝑢𝖳𝑅𝑢c(x,u)=l(x)+\frac{1}{2}{u}^{\mathsf{T}}Ruitalic_c ( italic_x , italic_u ) = italic_l ( italic_x ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_u start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_R italic_u, cT(xT)subscript𝑐𝑇subscript𝑥𝑇c_{T}(x_{T})italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) is the terminal cost,  πt(xt)subscript𝜋𝑡subscript𝑥𝑡\pi_{t}(x_{t})italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is a control policy and the cost is minimized over all possible such policies. We present a result that establishes that repeatedly solving a deterministic optimal control, or open-loop problem, from the current state, shown to be equivalent to applying the deterministic feedback policy to the system, results in a feedback policy that is O(ϵ4)𝑂superscriptitalic-ϵ4O(\epsilon^{4})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) near-optimal to the optimal stochastic feedback policy, in terms of the small noise parameter ϵitalic-ϵ\epsilonitalic_ϵ. Our analysis shows that under the relatively mild conditions of affine in control dynamics and quadratic in control cost, the open-loop solution obtained by satisfying the Minimum Principle [bryson] is globally optimum. Further, the deterministic feedback law has a perturbation structure in the sense that the higher-order feedback terms do not affect the lower-order terms, and that this structure is lost for the optimal stochastic problem. We obtain the equations that need to be satisfied by the linear and higher-order feedback terms in the optimal feedback law. Although near-optimal, empirical evidence shows that this replanning based Model Predictive Control (MPC)-type policy is the best we can do in practice, in the sense that albeit the optimal stochastic law should, in theory, have better performance, solving the stochastic problem is highly susceptible to noise owing to its lack of the perturbation structure, and in practice, the MPC-type law gives better performance. Thus, this result resolves the trade-off between tractability and optimality in stochastic feedback control problems, showing that, in practice, “what is tractable is also optimal”. In this paper, we consider the case where an analytical model is available for the control synthesis, we consider the case of data-based control in a companion paper [wang2022search]. A final note here is that the system considered in this work is not the most general, and our goal is not to analyze the most general case, rather, we show that even in the simplest case considered here, the stochastic problem is fundamentally intractable, and the best we can do in practice is use the deterministic feedback law, which is implemented by re-planning the open-loop as necessary.

A large majority of sequential decision making problems under uncertainty can be posed as a nonlinear stochastic optimal control problem that requires the solution of an associated Dynamic Programming (DP) problem, however, as the state dimension increases, the computational complexity grows exponentially in the state dimension [bertsekas1]: the manifestation of the so-called Bellman’s “curse of dimensionality (CoD)” [bellman]. Approximate DP (ADP), or alternatively, in Reinforcement Learning (RL), simulations of the process under a policy, is used to get an approximation of the cost-to-go function by sampling the domain [parr3, bertsekas1]. But, as the dimension d𝑑ditalic_d increases, the number of samples required for evaluation goes up exponentially. There has been recent success using the Deep RL paradigm where deep neural networks are used as nonlinear function approximators to keep the parametrization tractable [RLHD1, haarnoja2018soft, fujimoto2018addressing, RLHD4, RLHD5], however, the training times required for these approaches, and the variance of the solutions, is still prohibitive. Hence, the primary problem with ADP/ RL techniques is the CoD inherent in the complex representation of the cost-to-go function, and the exponentially large number of evaluations required for its estimation resulting in high solution variance which makes them unreliable and inaccurate.

In the case of continuous state, control, and observation space problems, the Model Predictive Control [Mayne_1, Mayne_2] approach has been used with a lot of success in the control system and robotics community. For deterministic systems, the process results in solving the original DP problem in a recursive online fashion. However, stochastic control problems, and the control of uncertain systems in general, is still an unresolved problem in MPC. As succinctly noted in [Mayne_1], the problem arises due to the fact that in stochastic control problems, the MPC optimization at every time step cannot be over deterministic control sequences, but rather has to be over feedback policies, which is, in general, difficult to accomplish since a tractable parameterization of such policies to perform the optimization over, is, in general, unavailable. Thus, the tube-based MPC approach, and its stochastic counterparts, typically consider linear systems [T-MPC1, T-MPC2, T-MPC3] for which a linear parametrization of the feedback policy suffices but the methods become intractable when dealing with nonlinear systems [Mayne_3]. In more recent work, event-triggered MPC [ETMPC1, ETMPC2] keeps the online planning computationally efficient by triggering replanning in an event driven fashion rather than at every time step. We note that event-triggered MPC inherits the same issues mentioned above with respect to the stochastic control problem, and consequently, the techniques are intractable for nonlinear systems. There has been recent work showing the near-optimality of MPC with a perturbation analysis [bounded_regret_mpc, bounded_regret_ltv, superconvergence_mpc], but this work considers a deterministic problem setting with unknown model parameters in the system dynamics, and the regret bound provided is with respect to the controller that has perfect knowledge of the model, in contrast, we show the near-optimality of the deterministic feedback to the optimal stochastic law.

The fundamental problem is that, albeit solving the open-loop problem via the Minimum Principle (MP) is much easier, solving for the optimal feedback control under uncertainty requires the solution of the DP equation, which is intractable. Moreover, this also begs the question, since all systems are subject to uncertainty, what is the utility of deterministic optimal control?
Contributions: In this work, we establish that the basic MPC-type approach of solving the deterministic open-loop problem (with progressively shorter horizons) at every time step results in a near-optimal policy, to O(ϵ4)𝑂superscriptitalic-ϵ4O(\epsilon^{4})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ), for a nonlinear stochastic system. The result uses a perturbation expansion of the cost-to-go function in terms of a perturbation parameter ϵitalic-ϵ\epsilonitalic_ϵ. We show the global optimality of the open-loop solution obtained by satisfying the Minimum Principle [bryson] using the classical Method of Characteristics [Courant-Hilbert] thereby establishing that the MPC feedback law is indeed the optimal deterministic feedback law. Further, we show that the deterministic feedback law has a perturbation structure that is lost in the stochastic problem. We obtain the true linear and higher order feedback gain equations of the optimal deterministic policy as a by-product, which is very different from the Riccati equation governing a typical LQR perturbation feedback design [bryson]. Finally, albeit the MPC law is only “near-optimum”, our empirical evidence shows that this deterministic law has better performance than the stochastic law, obtained by solving the stochastic DP problem computationally, showing the susceptibility of the stochastic DP problem to noise owing to the loss of the perturbation structure, quite apart from the usual curse of dimensionality. Thus, in practice, the MPC law is the best one can do. In contrast to [parunandi2019TPFC], we show fourth order near-optimality to the optimal stochastic solution, the global optimality of the open-loop solution and the perturbation structure of the deterministic feedback law, without which MPC is heuristic, and analytical as well as empirical evidence regarding the superiority of MPC to stochastic DP even when the DP problem is not subject to the curse of dimensionality. The current manuscript expands on our previously published conference paper [mohamed2022acc]. In particular, we provide detailed proofs of all our developments, explain the loss of perturbation structure in the stochastic problem, and provide a comprehensive empirical evaluation of the theory proposed in this manuscript.

The rest of the document is organized as follows: Section II states the problem, Section III presents three fundamental results that represent the three legs of the stool that supports the fact that the MPC feedback law is near-optimal, which is established in Section IV. We illustrate our results numerically in Section V using a simple 1-dimensional example for which the stochastic DP problem can be solved, and more practical examples from nonlinear robotic planning.

II PRELIMINARIES

The following outlines the finite time stochastic optimal control problem formulation, and the associated deterministic problem, along with the associated Dynamic Programming (DP) problems that we shall study in this work.

System Model

For a dynamical system, we denote the state and control vectors by x𝕏nx𝑥𝕏superscriptsubscript𝑛𝑥x\in\ \mathbb{X}\subset\ \mathbb{R}^{n_{x}}italic_x ∈ blackboard_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and u𝕌nu𝑢𝕌superscriptsubscript𝑛𝑢u\in\ \mathbb{U}\subset\ \mathbb{R}^{n_{u}}italic_u ∈ blackboard_U ⊂ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT respectively. The dynamics of the system is governed by the stochastic differential equation (SDE):

dx=((x)+𝒢(x)u)dt+ϵdw,𝑑𝑥𝑥𝒢𝑥𝑢𝑑𝑡italic-ϵ𝑑𝑤dx=(\mathcal{F}(x)+\mathcal{G}(x)u)dt+\epsilon dw,italic_d italic_x = ( caligraphic_F ( italic_x ) + caligraphic_G ( italic_x ) italic_u ) italic_d italic_t + italic_ϵ italic_d italic_w , (1)

where wnx𝑤superscriptsubscript𝑛𝑥w\in\mathbb{R}^{n_{x}}italic_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is a Wiener process with covariance Qnx×nx𝑄superscriptsubscript𝑛𝑥subscript𝑛𝑥Q\in\mathbb{R}^{n_{x}\times n_{x}}italic_Q ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and ϵitalic-ϵ\epsilonitalic_ϵ is a small parameter modulating the noise amplitude to the system and affects the signal-to-noise ratio.

1-Dimensional/ Scalar case: For the sake of simplicity in presenting the results, we will consider the scalar or 1-dimensional version of the problem, i.e., nx=nu=1subscript𝑛𝑥subscript𝑛𝑢1n_{x}=n_{u}=1italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = 1. The final results for the vector case will also be provided. The dynamics of the system for the scalar case is denoted by the following SDE:

dx=(f(x)+g(x)u)dt+ϵdw,𝑑𝑥𝑓𝑥𝑔𝑥𝑢𝑑𝑡italic-ϵ𝑑𝑤dx=(f(x)+g(x)u)dt+\epsilon dw,italic_d italic_x = ( italic_f ( italic_x ) + italic_g ( italic_x ) italic_u ) italic_d italic_t + italic_ϵ italic_d italic_w , (2)

where f(x)𝑓𝑥f(x)italic_f ( italic_x ) and g(x)𝑔𝑥g(x)italic_g ( italic_x ) are the 1-dimensional equivalent of (x)𝑥\mathcal{F}(x)caligraphic_F ( italic_x ) and 𝒢(x)𝒢𝑥\mathcal{G}(x)caligraphic_G ( italic_x ), respectively.

Stochastic optimal control problem

The stochastic optimal control problem for an initial state x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is defined as:

Jπ(x0)=minΠ𝔼[0Tc(xt,πt(xt))𝑑t+cT(xT)],superscript𝐽superscript𝜋subscript𝑥0subscriptΠsubscript𝔼absentdelimited-[]subscriptsuperscript𝑇0𝑐subscript𝑥𝑡subscript𝜋𝑡subscript𝑥𝑡differential-d𝑡subscript𝑐𝑇subscript𝑥𝑇J^{\pi^{*}}(x_{0})=\min_{\Pi}\ \mathop{\mathbb{E}}_{\begin{subarray}{c}{}\end{% subarray}}\left[{\int^{T}_{0}c(x_{t},\pi_{t}(x_{t}))dt+c_{T}(x_{T})}\right],italic_J start_POSTSUPERSCRIPT italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = roman_min start_POSTSUBSCRIPT roman_Π end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT start_ARG end_ARG end_POSTSUBSCRIPT [ ∫ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_c ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) italic_d italic_t + italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ] , (3)

subject to the SDE (1), where the optimization is over a family of time-varying feedback policies Π:={πt(x);t[0,T]\Pi:=\{\pi_{t}(x);t\in[0,T]roman_Π := { italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) ; italic_t ∈ [ 0 , italic_T ]}; Jπ():𝕏:superscript𝐽superscript𝜋𝕏J^{\pi^{*}}(\cdot):\mathbb{X}\rightarrow\mathbb{R}italic_J start_POSTSUPERSCRIPT italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( ⋅ ) : blackboard_X → blackboard_R is the cost function on applying the optimal policy πsuperscript𝜋\pi^{*}italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT; c(,):𝕏×𝕌:𝑐𝕏𝕌c(\cdot,\cdot):\mathbb{X}\times\mathbb{U}\rightarrow\mathbb{R}italic_c ( ⋅ , ⋅ ) : blackboard_X × blackboard_U → blackboard_R is the incremental cost function; and cT():𝕏:subscript𝑐𝑇𝕏c_{T}(\cdot):\mathbb{X}\rightarrow\mathbb{R}italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( ⋅ ) : blackboard_X → blackboard_R is the terminal cost function; where T𝑇Titalic_T is the “finite time horizon” of the problem.

Assumptions

We shall make the following assumptions in the rest of the paper, and unless otherwise stated, all results assume the following.

Assumption 1

(A1) Cost Structure. We assume that the incremental cost c(x,u)𝑐𝑥𝑢c(x,u)italic_c ( italic_x , italic_u ) is quadratic in the control variable, i.e., c(x,u)=l(x)+12u𝖳Ru𝑐𝑥𝑢𝑙𝑥12superscript𝑢𝖳𝑅𝑢c(x,u)=l(x)+\frac{1}{2}{u}^{\mathsf{T}}Ruitalic_c ( italic_x , italic_u ) = italic_l ( italic_x ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_u start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_R italic_u, with R𝑅Ritalic_R positive definite. The matrix R𝑅Ritalic_R will be replaced by r𝑟ritalic_r for the scalar case.

Assumption 2

(A2) Smoothness. We shall also assume that all the involved functions: (x),𝒢(x)𝑥𝒢𝑥\mathcal{F}(x),\mathcal{G}(x)caligraphic_F ( italic_x ) , caligraphic_G ( italic_x ) (f(x),g(x)𝑓𝑥𝑔𝑥f(x),g(x)italic_f ( italic_x ) , italic_g ( italic_x ) for the scalar case), l(x),cT(x),πt(x)𝑙𝑥subscript𝑐𝑇𝑥subscript𝜋𝑡𝑥l(x),c_{T}(x),\pi_{t}(x)italic_l ( italic_x ) , italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x ) , italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) are five times continuously differentiable (𝒞5superscript𝒞5\mathcal{C}^{5}caligraphic_C start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT) in their arguments.

II-A Stochastic Dynamic Programming

The continuous time DP or the stochastic Hamilton-Jacobi-Bellman (HJB) equation for the system in Eq. (1) is given by [OPT_todorov]

Jt=minuH(x,u)+ϵ22ij2JxixjQij,𝐽𝑡subscript𝑢𝐻𝑥𝑢superscriptitalic-ϵ22subscript𝑖subscript𝑗superscript2𝐽subscript𝑥𝑖subscript𝑥𝑗subscript𝑄𝑖𝑗-\frac{\partial J}{\partial t}=\min_{u}H(x,u)+\frac{\epsilon^{2}}{2}\sum_{i}% \sum_{j}\frac{\partial^{2}J}{\partial x_{i}\partial x_{j}}Q_{ij},- divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_t end_ARG = roman_min start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_H ( italic_x , italic_u ) + divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG italic_Q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , (4)

where, J=J(t,x)𝐽𝐽𝑡𝑥J=J(t,x)italic_J = italic_J ( italic_t , italic_x ), J(T,x)=cT(x)𝐽𝑇𝑥subscript𝑐𝑇𝑥J(T,x)=c_{T}(x)italic_J ( italic_T , italic_x ) = italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x ) is the terminal condition, H(x,u)=l(x)+12u𝖳Ru+Jx𝖳((x)+𝒢(x)u)𝐻𝑥𝑢𝑙𝑥12superscript𝑢𝖳𝑅𝑢superscript𝐽𝑥𝖳𝑥𝒢𝑥𝑢H(x,u)=l(x)+\frac{1}{2}{u}^{\mathsf{T}}Ru+{\frac{\partial J}{\partial x}}^{% \mathsf{T}}(\mathcal{F}(x)+\mathcal{G}(x)u)italic_H ( italic_x , italic_u ) = italic_l ( italic_x ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_u start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_R italic_u + divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_x end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( caligraphic_F ( italic_x ) + caligraphic_G ( italic_x ) italic_u ) is the Hamiltonian of the system, and Q=[Qij]𝑄delimited-[]subscript𝑄𝑖𝑗Q=[Q_{ij}]italic_Q = [ italic_Q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ] is the intensity of the vector Wiener process.
Let u(t,x)𝑢𝑡𝑥u(t,x)italic_u ( italic_t , italic_x ) denote the corresponding optimal policy. Then, it is sufficient that the optimal control u𝑢uitalic_u satisfies the first-order necessary condition (since the Hamiltonian H(x,u)𝐻𝑥𝑢H(x,u)italic_H ( italic_x , italic_u ) is strictly quadratic in u𝑢uitalic_u):

u=R1𝒢(x)𝖳Jx,whereJx=J(t,x)x.formulae-sequence𝑢superscript𝑅1𝒢superscript𝑥𝖳superscript𝐽𝑥wheresuperscript𝐽𝑥𝐽𝑡𝑥𝑥u=-R^{-1}{\mathcal{G}(x)}^{\mathsf{T}}J^{x},~{}\text{where}~{}J^{x}=\frac{% \partial J(t,x)}{\partial x}.italic_u = - italic_R start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_G ( italic_x ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT , where italic_J start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG ∂ italic_J ( italic_t , italic_x ) end_ARG start_ARG ∂ italic_x end_ARG . (5)

II-B The Deterministic Problem

Let us now consider the deterministic problem, i.e., Eq. (1) with ϵ=0italic-ϵ0\epsilon=0italic_ϵ = 0 and the same cost as in (3), except there is no expectation due to the lack of stochasticity. Utilizing essentially identical arguments as for the stochastic case, the optimal cost-to-go of the deterministic system, ϕ(t,x)italic-ϕ𝑡𝑥\phi(t,x)italic_ϕ ( italic_t , italic_x ), satisfies the deterministic HJB equation:

ϕt=minuH(x,u),italic-ϕ𝑡subscript𝑢𝐻𝑥𝑢-\frac{\partial{\phi}}{\partial t}=\min_{u}H(x,u),- divide start_ARG ∂ italic_ϕ end_ARG start_ARG ∂ italic_t end_ARG = roman_min start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_H ( italic_x , italic_u ) , (6)

where the terminal condition ϕ(T,x)=cT(x)italic-ϕ𝑇𝑥subscript𝑐𝑇𝑥\phi(T,x)=c_{T}(x)italic_ϕ ( italic_T , italic_x ) = italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x ), and the Hamiltonian H(x,u)=l(x)+12u𝖳Ru+ϕx𝖳((x)+𝒢(x)u)𝐻𝑥𝑢𝑙𝑥12superscript𝑢𝖳𝑅𝑢superscriptitalic-ϕ𝑥𝖳𝑥𝒢𝑥𝑢H(x,u)=l(x)+\frac{1}{2}{u}^{\mathsf{T}}Ru+{\frac{\partial\phi}{\partial x}}^{% \mathsf{T}}(\mathcal{F}(x)+\mathcal{G}(x)u)italic_H ( italic_x , italic_u ) = italic_l ( italic_x ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_u start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_R italic_u + divide start_ARG ∂ italic_ϕ end_ARG start_ARG ∂ italic_x end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( caligraphic_F ( italic_x ) + caligraphic_G ( italic_x ) italic_u ) is the exact same as that in the stochastic problem and the only difference is the missing diffusion term ϵ22Jx2superscriptitalic-ϵ2superscript2𝐽superscript𝑥2\epsilon^{2}\frac{\partial^{2}J}{\partial x^{2}}italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG. Finally, identical to the stochastic case, the optimal control in the deterministic case is given by:

ud=R1𝒢(x)𝖳ϕx,whereϕx=ϕx.formulae-sequencesuperscript𝑢𝑑superscript𝑅1𝒢superscript𝑥𝖳superscriptitalic-ϕ𝑥wheresuperscriptitalic-ϕ𝑥italic-ϕ𝑥\displaystyle u^{d}=-R^{-1}{\mathcal{G}(x)}^{\mathsf{T}}\phi^{x},\text{where}~% {}\phi^{x}=\frac{\partial\phi}{\partial x}.italic_u start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT = - italic_R start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_G ( italic_x ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT , where italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG ∂ italic_ϕ end_ARG start_ARG ∂ italic_x end_ARG . (7)

III A PERTURBATION ANALYSIS OF OPTIMAL FEEDBACK CONTROL

In the following four subsections, we establish four basic results that we shall use to establish the near optimality of the MPC law in Section IV. In Section III-A, we characterize the performance of any given feedback policy as a perturbation (series) expansion in the parameter ϵitalic-ϵ\epsilonitalic_ϵ. We establish that the O(ϵ0)𝑂superscriptitalic-ϵ0O(\epsilon^{0})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) term depends only on the nominal action, while the O(ϵ2)𝑂superscriptitalic-ϵ2O(\epsilon^{2})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) depends only on the linear part of the feedback law. In Section III-B, we find the differential equations satisfied by these different perturbation costs using the HJB equation and show that the stochastic and deterministic optimal feedback laws share the same nominal and first order costs. In Section III-C, we analyze the nominal/ open-loop problem using the Method of Characteristics and show that the open-loop optimal control has a unique global minimum. Also, we show that the deterministic optimal feedback control problem has a perturbation structure in that the higher order terms do not affect the lower order terms in a Taylor expansion of the optimal feedback law, and obtain the equations governing the optimal linear feedback term in the nonlinear problem, which is shown to be different from a traditional LQR design [bryson]. In Section III-D, we show that this perturbation structure is lost for the stochastic problem leading to a fundamental computational intractability, quite apart from the usual curse of dimensionality.

III-A Characterizing the Performance of a Feedback Policy

In order to derive the results in this section, we first discretize the SDE in Eq. (1) via a Forward Euler approximation [kloedon_numerical_sde, Ch.9] with discretization time ΔtΔ𝑡\Delta troman_Δ italic_t:

xk+1subscript𝑥𝑘1\displaystyle x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT =xk+((xk)+𝒢(xk)uk)Δt+ϵwkΔt+o(Δt),absentsubscript𝑥𝑘subscript𝑥𝑘𝒢subscript𝑥𝑘subscript𝑢𝑘Δ𝑡italic-ϵsubscript𝑤𝑘Δ𝑡𝑜Δ𝑡\displaystyle=x_{k}+(\mathcal{F}(x_{k})+\mathcal{G}(x_{k})u_{k})\Delta t+% \epsilon w_{k}\sqrt{\Delta t}+o(\Delta t),= italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ( caligraphic_F ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + caligraphic_G ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) roman_Δ italic_t + italic_ϵ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG + italic_o ( roman_Δ italic_t ) , (8)

where ϵ<1italic-ϵ1\epsilon<1italic_ϵ < 1 is a perturbation parameter, wksubscript𝑤𝑘w_{k}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a white noise sequence with covariance Q=Inx×nx𝑄subscript𝐼subscript𝑛𝑥subscript𝑛𝑥Q=I_{n_{x}\times n_{x}}italic_Q = italic_I start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUBSCRIPT, k=0,1N𝑘01𝑁k=0,1\cdots Nitalic_k = 0 , 1 ⋯ italic_N, where N=T/Δt𝑁𝑇Δ𝑡N=T/\Delta titalic_N = italic_T / roman_Δ italic_t, and o(Δt)0norm𝑜Δ𝑡0||o(\Delta t)||\rightarrow 0| | italic_o ( roman_Δ italic_t ) | | → 0 as Δt0Δ𝑡0\Delta t\rightarrow 0roman_Δ italic_t → 0. At the end of this Section, we will obtain the continuous time result by letting Δt0Δ𝑡0\Delta t\rightarrow 0roman_Δ italic_t → 0. For notational convenience, we shall not explicitly write the o(Δt)𝑜Δ𝑡o(\Delta t)italic_o ( roman_Δ italic_t ) term in the following.

Let us also consider a noiseless version of the system dynamics given by (8), obtained by setting wk=0subscript𝑤𝑘0w_{k}=0italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0 for all k𝑘kitalic_k: x¯k+1=x¯k+((x¯k)+𝒢(x¯k)u¯k)Δtsubscript¯𝑥𝑘1subscript¯𝑥𝑘subscript¯𝑥𝑘𝒢subscript¯𝑥𝑘subscript¯𝑢𝑘Δ𝑡\bar{x}_{k+1}=\bar{x}_{k}+(\mathcal{F}(\bar{x}_{k})+\mathcal{G}(\bar{x}_{k})% \bar{u}_{k})\Delta tover¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ( caligraphic_F ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + caligraphic_G ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) roman_Δ italic_t, where we denote the “nominal” state trajectory as x¯ksubscript¯𝑥𝑘\bar{x}_{k}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and the “nominal” control as u¯ksubscript¯𝑢𝑘\bar{u}_{k}over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, with u¯k=πkΔt(x¯k)subscript¯𝑢𝑘subscript𝜋𝑘Δ𝑡subscript¯𝑥𝑘\bar{u}_{k}=\pi_{k\Delta t}(\bar{x}_{k})over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_π start_POSTSUBSCRIPT italic_k roman_Δ italic_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), where {πkΔt(),k=0,1N}formulae-sequencesubscript𝜋𝑘Δ𝑡𝑘01𝑁\{\pi_{k\Delta t}(\cdot),k=0,1\cdots N\}{ italic_π start_POSTSUBSCRIPT italic_k roman_Δ italic_t end_POSTSUBSCRIPT ( ⋅ ) , italic_k = 0 , 1 ⋯ italic_N } is a discretization of a given continuous-time control policy Π={πt(x),t[0,T]}Πsubscript𝜋𝑡𝑥𝑡0𝑇\Pi=\{\pi_{t}(x),t\in[0,T]\}roman_Π = { italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) , italic_t ∈ [ 0 , italic_T ] }. In the following, to simplify notation, we shall drop the explicit reference to the discretization time ΔtΔ𝑡\Delta troman_Δ italic_t while denoting the discretized policy as {πk(x)},k=0,1,,Nformulae-sequencesubscript𝜋𝑘𝑥𝑘01𝑁\{\pi_{k}(x)\},k=0,1,\cdots,N{ italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) } , italic_k = 0 , 1 , ⋯ , italic_N.

Assuming that ()\mathcal{F}(\cdot)caligraphic_F ( ⋅ ) and πk()subscript𝜋𝑘\pi_{k}(\cdot)italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) are sufficiently smooth (assumption A2), we can expand the dynamics about the nominal trajectory using a Taylor series. Denoting δxk=xkx¯k,δuk=uku¯kformulae-sequence𝛿subscript𝑥𝑘subscript𝑥𝑘subscript¯𝑥𝑘𝛿subscript𝑢𝑘subscript𝑢𝑘subscript¯𝑢𝑘\delta x_{k}=x_{k}-\bar{x}_{k},\delta u_{k}=u_{k}-\bar{u}_{k}italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_δ italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, we can express,

δxk+1𝛿subscript𝑥𝑘1\displaystyle\delta x_{k+1}italic_δ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT =Akδxk+Bkδuk+Sk(δxk)+ϵwkΔt,absentsubscript𝐴𝑘𝛿subscript𝑥𝑘subscript𝐵𝑘𝛿subscript𝑢𝑘subscript𝑆𝑘𝛿subscript𝑥𝑘italic-ϵsubscript𝑤𝑘Δ𝑡\displaystyle=A_{k}\delta x_{k}+B_{k}\delta u_{k}+S_{k}(\delta x_{k})+\epsilon w% _{k}\sqrt{\Delta t},= italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_ϵ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG , (9)
δuk𝛿subscript𝑢𝑘\displaystyle\delta u_{k}italic_δ italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =Kkδxk+S~k(δxk),absentsubscript𝐾𝑘𝛿subscript𝑥𝑘subscript~𝑆𝑘𝛿subscript𝑥𝑘\displaystyle=K_{k}\delta x_{k}+\tilde{S}_{k}(\delta x_{k}),= italic_K start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , (10)

where Ak=Inx×nx+((x)+𝒢(x)u)Δtx|x¯k,u¯ksubscript𝐴𝑘subscript𝐼subscript𝑛𝑥subscript𝑛𝑥evaluated-at𝑥𝒢𝑥𝑢Δ𝑡𝑥subscript¯𝑥𝑘subscript¯𝑢𝑘A_{k}=I_{n_{x}\times n_{x}}+\frac{\partial(\mathcal{F}(x)+\mathcal{G}(x)u)% \Delta t}{\partial x}|_{\bar{x}_{k},\bar{u}_{k}}italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_I start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUBSCRIPT + divide start_ARG ∂ ( caligraphic_F ( italic_x ) + caligraphic_G ( italic_x ) italic_u ) roman_Δ italic_t end_ARG start_ARG ∂ italic_x end_ARG | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT,
Bk=((x)+𝒢(x)u)Δtu|x¯k,u¯k=𝒢(x¯k)Δtsubscript𝐵𝑘evaluated-at𝑥𝒢𝑥𝑢Δ𝑡𝑢subscript¯𝑥𝑘subscript¯𝑢𝑘𝒢subscript¯𝑥𝑘Δ𝑡B_{k}=\frac{\partial(\mathcal{F}(x)+\mathcal{G}(x)u)\Delta t}{\partial u}|_{% \bar{x}_{k},\bar{u}_{k}}=\mathcal{G}(\bar{x}_{k})\Delta titalic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG ∂ ( caligraphic_F ( italic_x ) + caligraphic_G ( italic_x ) italic_u ) roman_Δ italic_t end_ARG start_ARG ∂ italic_u end_ARG | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = caligraphic_G ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) roman_Δ italic_t, Kk=πkx|x¯ksubscript𝐾𝑘evaluated-atsubscript𝜋𝑘𝑥subscript¯𝑥𝑘K_{k}=\frac{\partial\pi_{k}}{\partial x}|_{\bar{x}_{k}}italic_K start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG ∂ italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x end_ARG | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and Sk(),S~k()subscript𝑆𝑘subscript~𝑆𝑘S_{k}(\cdot),\tilde{S}_{k}(\cdot)italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) , over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) are second and higher order terms in the respective expansions.

Using (9) and (10), we can write the closed-loop dynamics of the trajectory (δxk)k=1Nsubscriptsuperscript𝛿subscript𝑥𝑘𝑁𝑘1(\delta x_{k})^{N}_{k=1}( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT as,

δxk+1=(Ak+BkKk)A¯kδxk𝛿subscript𝑥𝑘1subscriptsubscript𝐴𝑘subscript𝐵𝑘subscript𝐾𝑘subscript¯𝐴𝑘𝛿subscript𝑥𝑘\displaystyle\delta x_{k+1}=\underbrace{(A_{k}+B_{k}K_{k})}_{\bar{A}_{k}}% \delta x_{k}italic_δ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = under⏟ start_ARG ( italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT +BkS~k(δxk)+Sk(δxk)S¯k(δxk)subscriptsubscript𝐵𝑘subscript~𝑆𝑘𝛿subscript𝑥𝑘subscript𝑆𝑘𝛿subscript𝑥𝑘subscript¯𝑆𝑘𝛿subscript𝑥𝑘\displaystyle+\underbrace{B_{k}\tilde{S}_{k}(\delta x_{k})+S_{k}(\delta x_{k})% }_{\bar{S}_{k}(\delta x_{k})}+ under⏟ start_ARG italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT
+ϵwkΔt,italic-ϵsubscript𝑤𝑘Δ𝑡\displaystyle+\epsilon w_{k}\sqrt{\Delta t},+ italic_ϵ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG , (11)

where A¯ksubscript¯𝐴𝑘\bar{A}_{k}over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT represents the linear part of the closed-loop system and the term S¯k()subscript¯𝑆𝑘\bar{S}_{k}(\cdot)over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) represents the second and higher order terms in the closed-loop system.

Similarly, we can expand the instantaneous cost c(xk,uk)𝑐subscript𝑥𝑘subscript𝑢𝑘c(x_{k},u_{k})italic_c ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) about the nominal values c(x¯k,u¯k)𝑐subscript¯𝑥𝑘subscript¯𝑢𝑘c(\bar{x}_{k},\bar{u}_{k})italic_c ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) as,

c(xk,uk)Δt𝑐subscript𝑥𝑘subscript𝑢𝑘Δ𝑡\displaystyle c(x_{k},u_{k}){\Delta t\ }italic_c ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) roman_Δ italic_t =(l(x¯k)+Lkδxk+Hk(δxk)+\displaystyle=\Big{(}{l}(\bar{x}_{k})+L_{k}\delta x_{k}+H_{k}(\delta x_{k})+= ( italic_l ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) +
12u¯k𝖳Ru¯k+δuk𝖳Ru¯k+12δuk𝖳Rδuk)Δt,\displaystyle\frac{1}{2}{\bar{u}_{k}}^{\mathsf{T}}R\bar{u}_{k}+{\delta u_{k}}^% {\mathsf{T}}R\bar{u}_{k}+\frac{1}{2}{\delta u_{k}}^{\mathsf{T}}R\delta u_{k}% \Big{)}\Delta t,divide start_ARG 1 end_ARG start_ARG 2 end_ARG over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_R over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_δ italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_R over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_δ italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_R italic_δ italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) roman_Δ italic_t , (12)
cT(xN)subscript𝑐𝑇subscript𝑥𝑁\displaystyle c_{T}(x_{N})italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) =cT(x¯N)+CTδxN+HT(δxN),absentsubscript𝑐𝑇subscript¯𝑥𝑁subscript𝐶𝑇𝛿subscript𝑥𝑁subscript𝐻𝑇𝛿subscript𝑥𝑁\displaystyle={c}_{T}(\bar{x}_{N})+C_{T}\delta x_{N}+H_{T}(\delta x_{N}),= italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) + italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT + italic_H start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) , (13)

where Lk=lx|x¯ksubscript𝐿𝑘evaluated-at𝑙𝑥subscript¯𝑥𝑘L_{k}=\frac{\partial l}{\partial x}|_{\bar{x}_{k}}italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG ∂ italic_l end_ARG start_ARG ∂ italic_x end_ARG | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT, CT=cTx|x¯Nsubscript𝐶𝑇evaluated-atsubscript𝑐𝑇𝑥subscript¯𝑥𝑁C_{T}=\frac{\partial c_{T}}{\partial x}|_{\bar{x}_{N}}italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = divide start_ARG ∂ italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x end_ARG | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and Hk()subscript𝐻𝑘H_{k}(\cdot)italic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) and HT()subscript𝐻𝑇H_{T}(\cdot)italic_H start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( ⋅ ) are second and higher order terms in the respective expansions. The closed-loop incremental cost given in (12) can be expressed as

c(xk,uk)Δt={l(x¯k)+12u¯k𝖳Ru¯k}Δtc¯k+[Lk+u¯k𝖳RKk]ΔtC¯kδxk+H¯k(δxk),𝑐subscript𝑥𝑘subscript𝑢𝑘Δ𝑡subscript𝑙subscript¯𝑥𝑘12superscriptsubscript¯𝑢𝑘𝖳𝑅subscript¯𝑢𝑘Δ𝑡subscript¯𝑐𝑘subscriptdelimited-[]subscript𝐿𝑘superscriptsubscript¯𝑢𝑘𝖳𝑅subscript𝐾𝑘Δ𝑡subscript¯𝐶𝑘𝛿subscript𝑥𝑘subscript¯𝐻𝑘𝛿subscript𝑥𝑘c(x_{k},u_{k}){\Delta t\ }=\underbrace{\{{l}(\bar{x}_{k})+\frac{1}{2}{\bar{u}_% {k}}^{\mathsf{T}}R\bar{u}_{k}\}\Delta t}_{\bar{c}_{k}}+\\ \underbrace{[L_{k}+{\bar{u}_{k}}^{\mathsf{T}}RK_{k}]\Delta t}_{\bar{C}_{k}}% \delta x_{k}+\bar{H}_{k}(\delta x_{k}),start_ROW start_CELL italic_c ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) roman_Δ italic_t = under⏟ start_ARG { italic_l ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_R over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } roman_Δ italic_t end_ARG start_POSTSUBSCRIPT over¯ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT + end_CELL end_ROW start_ROW start_CELL under⏟ start_ARG [ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_R italic_K start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] roman_Δ italic_t end_ARG start_POSTSUBSCRIPT over¯ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + over¯ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , end_CELL end_ROW

where H¯k(δxk)subscript¯𝐻𝑘𝛿subscript𝑥𝑘\bar{H}_{k}(\delta x_{k})over¯ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) are the second and higher order terms. Therefore, the cumulative cost of any given closed-loop trajectory (xk,uk)k=0Nsubscriptsuperscriptsubscript𝑥𝑘subscript𝑢𝑘𝑁𝑘0(x_{k},u_{k})^{N}_{k=0}( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT can be expressed as, 𝒥π(x0)=k=0Nc(xk,πk(xk))Δt+cT(xN)superscript𝒥𝜋subscript𝑥0subscriptsuperscript𝑁𝑘0𝑐subscript𝑥𝑘subscript𝜋𝑘subscript𝑥𝑘Δ𝑡subscript𝑐𝑇subscript𝑥𝑁\mathcal{J}^{\pi}(x_{0})=\sum^{N}_{k=0}c(x_{k},\pi_{k}(x_{k})){\Delta t\ }+c_{% T}(x_{N})caligraphic_J start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ∑ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT italic_c ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) roman_Δ italic_t + italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ), which can be written in the following form:

𝒥π(x0)superscript𝒥𝜋subscript𝑥0\displaystyle\mathcal{J}^{\pi}(x_{0})caligraphic_J start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =k=0Nc¯k+k=0NC¯kδxk+k=0NH¯k(δxk),absentsuperscriptsubscript𝑘0𝑁subscript¯𝑐𝑘superscriptsubscript𝑘0𝑁subscript¯𝐶𝑘𝛿subscript𝑥𝑘superscriptsubscript𝑘0𝑁subscript¯𝐻𝑘𝛿subscript𝑥𝑘\displaystyle=\sum_{k=0}^{N}\bar{c}_{k}+\sum_{k=0}^{N}\bar{C}_{k}\delta x_{k}+% \sum_{k=0}^{N}\bar{H}_{k}(\delta x_{k}),= ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT over¯ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT over¯ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT over¯ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , (14)

where c¯N=cT(x¯N),C¯N=CTformulae-sequencesubscript¯𝑐𝑁subscript𝑐𝑇subscript¯𝑥𝑁subscript¯𝐶𝑁subscript𝐶𝑇\bar{c}_{N}=c_{T}(\bar{x}_{N}),\bar{C}_{N}=C_{T}over¯ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) , over¯ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT.

We first show the following critical result. Note: The proofs for the results shown here are given in the appendix.

Lemma 1

Given any sample path, the state perturbation equation given in (III-A) can be equivalently characterized as

δxk=δxkl+ek,δxk+1l=A¯kδxkl+ϵwkΔtformulae-sequence𝛿subscript𝑥𝑘𝛿superscriptsubscript𝑥𝑘𝑙subscript𝑒𝑘𝛿superscriptsubscript𝑥𝑘1𝑙subscript¯𝐴𝑘𝛿superscriptsubscript𝑥𝑘𝑙italic-ϵsubscript𝑤𝑘Δ𝑡\displaystyle\delta x_{k}=\delta x_{k}^{l}+e_{k},~{}\delta x_{k+1}^{l}=\bar{A}% _{k}\delta x_{k}^{l}+\epsilon w_{k}\sqrt{\Delta t}italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_δ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + italic_ϵ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG (15)

where eksubscript𝑒𝑘e_{k}italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is an O(ϵ2)𝑂superscriptitalic-ϵ2O(\epsilon^{2})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) function that depends on the entire noise history {w0,w1,wk}subscript𝑤0subscript𝑤1subscript𝑤𝑘\{w_{0},w_{1},\cdots w_{k}\}{ italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } and δxkl𝛿superscriptsubscript𝑥𝑘𝑙\delta x_{k}^{l}italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT evolves according to the linear closed-loop system. Furthermore, ek=ek(2)+O(ϵ3)subscript𝑒𝑘superscriptsubscript𝑒𝑘2𝑂superscriptitalic-ϵ3e_{k}=e_{k}^{(2)}+O(\epsilon^{3})italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ), where ek(2)=A¯k1ek1(2)+δxk1l𝖳S¯k1(2)δxk1lsuperscriptsubscript𝑒𝑘2subscript¯𝐴𝑘1superscriptsubscript𝑒𝑘12𝛿superscriptsuperscriptsubscript𝑥𝑘1𝑙𝖳superscriptsubscript¯𝑆𝑘12𝛿superscriptsubscript𝑥𝑘1𝑙e_{k}^{(2)}=\bar{A}_{k-1}e_{k-1}^{(2)}+{\delta x_{k-1}^{l}}^{\mathsf{T}}\bar{S% }_{k-1}^{(2)}\delta x_{k-1}^{l}italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT = over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT + italic_δ italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, e0(2)=0superscriptsubscript𝑒020e_{0}^{(2)}=0italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT = 0, and S¯k(2)superscriptsubscript¯𝑆𝑘2\bar{S}_{k}^{(2)}over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT represents the Hessian matrix corresponding to the Taylor series expansion of the function S¯k()subscript¯𝑆𝑘\bar{S}_{k}(\cdot)over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ).

Next, we have the following result for the expansion of the cost-to-go function 𝒥π(x0)superscript𝒥𝜋subscript𝑥0\mathcal{J}^{\pi}(x_{0})caligraphic_J start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ).

Lemma 2

Given any sample path, the cost-to-go under a policy can be expanded about the nominal as:

𝒥π(x0)=superscript𝒥𝜋subscript𝑥0absent\displaystyle\mathcal{J}^{\pi}(x_{0})=caligraphic_J start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = kc¯kJ¯π+kC¯kδxklδJ1π+kδxkl𝖳H¯k(2)δxkl+C¯kek(2)δJ2πsubscriptsubscript𝑘subscript¯𝑐𝑘superscript¯𝐽𝜋subscriptsubscript𝑘subscript¯𝐶𝑘𝛿superscriptsubscript𝑥𝑘𝑙𝛿superscriptsubscript𝐽1𝜋subscriptsubscript𝑘𝛿superscriptsuperscriptsubscript𝑥𝑘𝑙𝖳superscriptsubscript¯𝐻𝑘2𝛿superscriptsubscript𝑥𝑘𝑙subscript¯𝐶𝑘superscriptsubscript𝑒𝑘2𝛿superscriptsubscript𝐽2𝜋\displaystyle\underbrace{\sum_{k}\bar{c}_{k}}_{\bar{J}^{\pi}}+\underbrace{\sum% _{k}\bar{C}_{k}\delta x_{k}^{l}}_{\delta J_{1}^{\pi}}+\underbrace{\sum_{k}{% \delta x_{k}^{l}}^{\mathsf{T}}\bar{H}_{k}^{(2)}\delta x_{k}^{l}+\bar{C}_{k}e_{% k}^{(2)}}_{\delta J_{2}^{\pi}}under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over¯ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT over¯ start_ARG italic_J end_ARG start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over¯ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_δ italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over¯ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + over¯ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_δ italic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT end_POSTSUBSCRIPT
+O(ϵ3),𝑂superscriptitalic-ϵ3\displaystyle+O(\epsilon^{3}),+ italic_O ( italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) ,

where H¯k(2)superscriptsubscript¯𝐻𝑘2\bar{H}_{k}^{(2)}over¯ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT denotes the second order coefficient of the Taylor expansion of H¯k()subscript¯𝐻𝑘\bar{H}_{k}(\cdot)over¯ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ).

Finally, we have the following result characterizing the cost of the policy as the discretization time Δt0Δ𝑡0\Delta t\rightarrow 0roman_Δ italic_t → 0.

Proposition 1

Under A2, and given that the closed loop system under the policy πt(.)\pi_{t}(.)italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( . ) has a solution over the interval [0,T]0𝑇[0,T][ 0 , italic_T ],  the mean of the cost-to-go function obeys: limΔt0E[𝒥π(x0)]Jπ(x0)=Jπ,0(x0)+ϵ2Jπ,1(x0)+ϵ4Jπ,2(x0)+π(x0)subscriptΔ𝑡0𝐸delimited-[]superscript𝒥𝜋subscript𝑥0superscript𝐽𝜋subscript𝑥0superscript𝐽𝜋0subscript𝑥0superscriptitalic-ϵ2superscript𝐽𝜋1subscript𝑥0superscriptitalic-ϵ4superscript𝐽𝜋2subscript𝑥0superscript𝜋subscript𝑥0\lim_{\Delta t\rightarrow 0}E[\mathcal{J}^{\pi}(x_{0})]\equiv J^{\pi}(x_{0})={% J}^{\pi,0}(x_{0})+\epsilon^{2}{J}^{\pi,1}(x_{0})+\epsilon^{4}{J}^{\pi,2}(x_{0}% )+\mathcal{R}^{\pi}(x_{0})roman_lim start_POSTSUBSCRIPT roman_Δ italic_t → 0 end_POSTSUBSCRIPT italic_E [ caligraphic_J start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] ≡ italic_J start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_J start_POSTSUPERSCRIPT italic_π , 0 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT italic_π , 1 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT italic_π , 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + caligraphic_R start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), for some constants Jπ,k(x0)superscript𝐽𝜋𝑘subscript𝑥0{J}^{\pi,k}(x_{0})italic_J start_POSTSUPERSCRIPT italic_π , italic_k end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), k=0,1,2𝑘012k=0,1,2italic_k = 0 , 1 , 2, where π(x0)superscript𝜋subscript𝑥0\mathcal{R}^{\pi}(x_{0})caligraphic_R start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) is o(ϵ4)𝑜superscriptitalic-ϵ4o(\epsilon^{4})italic_o ( italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ), i.e., limϵ0ϵ4π(x0)=0subscriptitalic-ϵ0superscriptitalic-ϵ4superscript𝜋subscript𝑥00\lim_{\epsilon\rightarrow 0}\epsilon^{-4}\mathcal{R}^{\pi}(x_{0})=0roman_lim start_POSTSUBSCRIPT italic_ϵ → 0 end_POSTSUBSCRIPT italic_ϵ start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT caligraphic_R start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = 0. Furthermore, the term Jπ,0superscript𝐽𝜋0{J}^{\pi,0}italic_J start_POSTSUPERSCRIPT italic_π , 0 end_POSTSUPERSCRIPT arises solely from the nominal control sequence while Jπ,1superscript𝐽𝜋1{J}^{\pi,1}italic_J start_POSTSUPERSCRIPT italic_π , 1 end_POSTSUPERSCRIPT is solely dependent on the nominal control and the linear part of the perturbation closed-loop.

Remark 1

The interpretation of the result above is as follows: it shows that the ϵ0superscriptitalic-ϵ0\epsilon^{0}italic_ϵ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT term, Jπ,0superscript𝐽𝜋0{J}^{\pi,0}italic_J start_POSTSUPERSCRIPT italic_π , 0 end_POSTSUPERSCRIPT, in the cost, stems from the nominal action of the control policy, the ϵ2superscriptitalic-ϵ2\epsilon^{2}italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT term, Jπ,1superscript𝐽𝜋1J^{\pi,1}italic_J start_POSTSUPERSCRIPT italic_π , 1 end_POSTSUPERSCRIPT, stems from the linear feedback action of the closed-loop, while the higher order terms stem from the higher order terms in the feedback law. In the next section, we use the HJB equation to find the equations satisfied by these terms.

Remark 2

In the above development, we have derived the expression for the cost-to-go of a policy from the initial state x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT at the initial time t=0𝑡0t=0italic_t = 0, i.e., the above expressions are for Jπ(0,x0)superscript𝐽𝜋0subscript𝑥0J^{\pi}(0,x_{0})italic_J start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( 0 , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), however, such an expression is also valid for any pair (t,x)𝑡𝑥(t,x)( italic_t , italic_x ) simply by repeating the above development starting at time t𝑡titalic_t from state x𝑥xitalic_x, i.e., any Jπ(t,x)=Jπ,0(t,x)+ϵ2Jπ,1(t,x)+ϵ4Jπ,2(t,x)+π(t,x)superscript𝐽𝜋𝑡𝑥superscript𝐽𝜋0𝑡𝑥superscriptitalic-ϵ2superscript𝐽𝜋1𝑡𝑥superscriptitalic-ϵ4superscript𝐽𝜋2𝑡𝑥superscript𝜋𝑡𝑥J^{\pi}(t,x)=J^{\pi,0}(t,x)+\epsilon^{2}J^{\pi,1}(t,x)+\epsilon^{4}J^{\pi,2}(t% ,x)+\mathcal{R}^{\pi}(t,x)italic_J start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_t , italic_x ) = italic_J start_POSTSUPERSCRIPT italic_π , 0 end_POSTSUPERSCRIPT ( italic_t , italic_x ) + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT italic_π , 1 end_POSTSUPERSCRIPT ( italic_t , italic_x ) + italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT italic_π , 2 end_POSTSUPERSCRIPT ( italic_t , italic_x ) + caligraphic_R start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_t , italic_x ).

III-B A Closeness Result for Optimal Stochastic and Deterministic Control

Recall the stochastic and deterministic HJB equations (4), (6) from Section II, and the associated optimal control policies (5) and (7). For simplicity, we consider the scalar case here, the vector case is detailed in the Appendix. Let φ(t,x)𝜑𝑡𝑥\varphi(t,x)italic_φ ( italic_t , italic_x ) denote the cost-to-go of the deterministic policy when applied to the stochastic system, i.e., udsuperscript𝑢𝑑u^{d}italic_u start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT applied to Eq. (2). Note that the cost-to-go of the deterministic policy applied to the stochastic system, φ(t,x)𝜑𝑡𝑥\varphi(t,x)italic_φ ( italic_t , italic_x ), is different from the deterministic cost-to-go ϕ(t,x)italic-ϕ𝑡𝑥\phi(t,x)italic_ϕ ( italic_t , italic_x ), and φ(t,x)𝜑𝑡𝑥\varphi(t,x)italic_φ ( italic_t , italic_x ) satisfies a policy evaluation equation [bertsekas1]. Similar to the stochastic HJB, the continuous time policy evaluation equation for φ(t,x)𝜑𝑡𝑥\varphi(t,x)italic_φ ( italic_t , italic_x ) can be written as:

φt=l(x)+12r(ud)2+φx(f(x)+g(x)ud)+ϵ22φxx,𝜑𝑡𝑙𝑥12𝑟superscriptsuperscript𝑢𝑑2superscript𝜑𝑥𝑓𝑥𝑔𝑥superscript𝑢𝑑superscriptitalic-ϵ22superscript𝜑𝑥𝑥\frac{\partial\varphi}{\partial t}=l(x)+\frac{1}{2}r(u^{d})^{2}+\varphi^{x}(f(% x)+g(x)u^{d})+\frac{\epsilon^{2}}{2}\varphi^{xx},divide start_ARG ∂ italic_φ end_ARG start_ARG ∂ italic_t end_ARG = italic_l ( italic_x ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_r ( italic_u start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_φ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ( italic_f ( italic_x ) + italic_g ( italic_x ) italic_u start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) + divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_φ start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT , (16)

where ud=1rg(x)ϕxsuperscript𝑢𝑑1𝑟𝑔𝑥superscriptitalic-ϕ𝑥u^{d}=-\frac{1}{r}g(x)\phi^{x}italic_u start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT = - divide start_ARG 1 end_ARG start_ARG italic_r end_ARG italic_g ( italic_x ) italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT. Then, we have the following key result. An analogous version of the following result was originally proved in a seminal paper [fleming1971stochastic] for first passage problems. We provide a simple derivation of the result for a finite time final value problem below.

Proposition 2

The cost function of the optimal stochastic policy, J(t,x)𝐽𝑡𝑥J(t,x)italic_J ( italic_t , italic_x ), and the cost function of the “deterministic policy applied to the stochastic system”, φ(t,x)𝜑𝑡𝑥\varphi(t,x)italic_φ ( italic_t , italic_x ), satisfy: J(t,x)=J0(t,x)+ϵ2J1(t,x)+ϵ4J2(t,x)+𝐽𝑡𝑥superscript𝐽0𝑡𝑥superscriptitalic-ϵ2superscript𝐽1𝑡𝑥superscriptitalic-ϵ4superscript𝐽2𝑡𝑥J(t,x)=J^{0}(t,x)+\epsilon^{2}J^{1}(t,x)+\epsilon^{4}J^{2}(t,x)+\cdotsitalic_J ( italic_t , italic_x ) = italic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_t , italic_x ) + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_t , italic_x ) + italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t , italic_x ) + ⋯, and φ(t,x)=φ0(t,x)+ϵ2φ1(t,x)+ϵ4φ2(t,x)+𝜑𝑡𝑥superscript𝜑0𝑡𝑥superscriptitalic-ϵ2superscript𝜑1𝑡𝑥superscriptitalic-ϵ4superscript𝜑2𝑡𝑥\varphi(t,x)=\varphi^{0}(t,x)+\epsilon^{2}\varphi^{1}(t,x)+\epsilon^{4}\varphi% ^{2}(t,x)+\cdotsitalic_φ ( italic_t , italic_x ) = italic_φ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_t , italic_x ) + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_φ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_t , italic_x ) + italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_φ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t , italic_x ) + ⋯. Furthermore, J0(t,x)=φ0(t,x)superscript𝐽0𝑡𝑥superscript𝜑0𝑡𝑥J^{0}(t,x)=\varphi^{0}(t,x)italic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_t , italic_x ) = italic_φ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_t , italic_x ), and J1(t,x)=φ1(t,x)superscript𝐽1𝑡𝑥superscript𝜑1𝑡𝑥J^{1}(t,x)=\varphi^{1}(t,x)italic_J start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_t , italic_x ) = italic_φ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_t , italic_x ), for all (t,x)𝑡𝑥(t,x)( italic_t , italic_x ).

Proof:

We show a sketch here for the case of a scalar state, please refer to the appendix for the complete proof.
Due to Proposition 1, the optimal cost function satisfies: J(t,x)=J0(t,x)+ϵ2J1(t,x)+ϵ4J2(t,x)+𝐽𝑡𝑥superscript𝐽0𝑡𝑥superscriptitalic-ϵ2superscript𝐽1𝑡𝑥superscriptitalic-ϵ4superscript𝐽2𝑡𝑥J(t,x)=J^{0}(t,x)+\epsilon^{2}J^{1}(t,x)+\epsilon^{4}J^{2}(t,x)+\cdotsitalic_J ( italic_t , italic_x ) = italic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_t , italic_x ) + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_t , italic_x ) + italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t , italic_x ) + ⋯. Next, we substitute the above equation into the HJB equation (4), along with the minimizing control (5) to obtain a perturbation expansion of the optimal cost function as a power series in ϵ2superscriptitalic-ϵ2\epsilon^{2}italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Equating the O(ϵ0)𝑂superscriptitalic-ϵ0O(\epsilon^{0})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) and O(ϵ2)𝑂superscriptitalic-ϵ2O(\epsilon^{2})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) terms on both sides results in governing equations for the J0superscript𝐽0J^{0}italic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and J1superscript𝐽1J^{1}italic_J start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT terms. We also know that the cost function of the deterministic policy when applied to the stochastic system satisfies φ(t,x)=φ0(t,x)+ϵ2φ1(t,x)+𝜑𝑡𝑥superscript𝜑0𝑡𝑥superscriptitalic-ϵ2superscript𝜑1𝑡𝑥\varphi(t,x)=\varphi^{0}(t,x)+\epsilon^{2}\varphi^{1}(t,x)+\cdotsitalic_φ ( italic_t , italic_x ) = italic_φ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_t , italic_x ) + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_φ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_t , italic_x ) + ⋯. Similar to above, we substitute this expression into the policy evaluation equation (16), along with the deterministic optimal control expression ud=1rg(x)ϕxsuperscript𝑢𝑑1𝑟𝑔𝑥superscriptitalic-ϕ𝑥u^{d}=-\frac{1}{r}g(x)\phi^{x}italic_u start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT = - divide start_ARG 1 end_ARG start_ARG italic_r end_ARG italic_g ( italic_x ) italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT, to obtain the governing equations for φ0superscript𝜑0\varphi^{0}italic_φ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and φ1superscript𝜑1\varphi^{1}italic_φ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT. These equations, when compared with those for J0superscript𝐽0J^{0}italic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and J1superscript𝐽1J^{1}italic_J start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT, are seen to be identical with the same terminal conditions thereby proving the result. ∎

Remark 3

𝒪(ϵ4)𝒪superscriptitalic-ϵ4\mathcal{O}(\epsilon^{4})caligraphic_O ( italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) Near-Optimality of Linear Perturbation Feedback. According to Proposition 1, we know that the O(ϵ2)𝑂superscriptitalic-ϵ2O(\epsilon^{2})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) term in the perturbation expansion above stems from the linear feedback term for any policy, and thus, the same is true for the deterministic policy. Given an initial state x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, let (x¯(t),u¯(t))¯𝑥𝑡¯𝑢𝑡(\bar{x}(t),\bar{u}(t))( over¯ start_ARG italic_x end_ARG ( italic_t ) , over¯ start_ARG italic_u end_ARG ( italic_t ) ) denote the optimal nominal trajectory under the deterministic feedback law and let Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denote the linear feedback corresponding to the expansion of the feedback law about this nominal trajectory. Therefore, it follows that if one applies the perturbation linear feedback law u(t,xt)=u¯t+Ktδxt𝑢𝑡subscript𝑥𝑡subscript¯𝑢𝑡subscript𝐾𝑡𝛿subscript𝑥𝑡u(t,x_{t})=\bar{u}_{t}+K_{t}\delta x_{t}italic_u ( italic_t , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where the feedback acts on the perturbation from the nominal, δxt=xtx¯t𝛿subscript𝑥𝑡subscript𝑥𝑡subscript¯𝑥𝑡\delta x_{t}=x_{t}-\bar{x}_{t}italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, starting at the initial state x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, then the performance of this linear feedback policy is also within O(ϵ4)𝑂superscriptitalic-ϵ4O(\epsilon^{4})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) of the optimal stochastic policy.

III-C A Perturbation Expansion of Deterministic Optimal Feedback Control: the Method of Characteristics (MOC)

In this section, we will use the classical Method of Characteristics [Courant-Hilbert] to derive results regarding the deterministic optimal control problem. In particular, we will show that satisfying the Minimum Principle is sufficient to assure us of a global optimum for the open-loop problem. Perhaps most importantly, we shall show that the deterministic cost-to-go function has a perturbation structure in that the higher-order terms do not affect the lower-order terms in a Taylor expansion of the optimal feedback law. We also obtain the equations governing the linear and higher-order feedback terms, and show that the linear feedback gain is different from the standard LQR design. Again, for simplicity, we derive the following for the case of a scalar state, please see the Appendix for the vector case.

Let us recall the Hamilton-Jacobi-Bellman (HJB) equation in continuous-time under the same assumptions as above, i.e., quadratic in control cost c(x,u)=l(x)+12ru2𝑐𝑥𝑢𝑙𝑥12𝑟superscript𝑢2c(x,u)=l(x)+\frac{1}{2}ru^{2}italic_c ( italic_x , italic_u ) = italic_l ( italic_x ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_r italic_u start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and affine in control dynamics x˙=f(x)+g(x)u˙𝑥𝑓𝑥𝑔𝑥𝑢\dot{x}=f(x)+g(x)uover˙ start_ARG italic_x end_ARG = italic_f ( italic_x ) + italic_g ( italic_x ) italic_u [bryson]:

Jt+l(x)12g(x)2r(Jx)2+f(x)Jx=0,𝐽𝑡𝑙𝑥12𝑔superscript𝑥2𝑟superscriptsuperscript𝐽𝑥2𝑓𝑥superscript𝐽𝑥0\frac{\partial J}{\partial t}+l(x)-\frac{1}{2}\frac{g(x)^{2}}{r}(J^{x})^{2}+f(% x)J^{x}=0,divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_t end_ARG + italic_l ( italic_x ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG divide start_ARG italic_g ( italic_x ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG ( italic_J start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_f ( italic_x ) italic_J start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = 0 , (17)

where J=J(t,x),Jx=Jxformulae-sequence𝐽𝐽𝑡𝑥superscript𝐽𝑥𝐽𝑥J=J(t,x),\;J^{x}=\frac{\partial J}{\partial x}italic_J = italic_J ( italic_t , italic_x ) , italic_J start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_x end_ARG, and the equation is integrated back in time with terminal condition J(T,x)=cT(x)𝐽𝑇𝑥subscript𝑐𝑇𝑥J(T,x)=c_{T}(x)italic_J ( italic_T , italic_x ) = italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x ). Define Jt=p,Jx=qformulae-sequence𝐽𝑡𝑝superscript𝐽𝑥𝑞\frac{\partial J}{\partial t}=p,\;J^{x}=qdivide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_t end_ARG = italic_p , italic_J start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = italic_q, then the HJB can be written as F(t,x,J,p,q)=0𝐹𝑡𝑥𝐽𝑝𝑞0F(t,x,J,p,q)=0italic_F ( italic_t , italic_x , italic_J , italic_p , italic_q ) = 0, where F(t,x,J,p,q)=p+l(x)12g(x)2rq2+f(x)q𝐹𝑡𝑥𝐽𝑝𝑞𝑝𝑙𝑥12𝑔superscript𝑥2𝑟superscript𝑞2𝑓𝑥𝑞F(t,x,J,p,q)=p+l(x)-\frac{1}{2}\frac{g(x)^{2}}{r}q^{2}+f(x)qitalic_F ( italic_t , italic_x , italic_J , italic_p , italic_q ) = italic_p + italic_l ( italic_x ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG divide start_ARG italic_g ( italic_x ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_f ( italic_x ) italic_q. One can now write the Lagrange-Charpit equations [Courant-Hilbert] for the HJB as:

x˙˙𝑥\displaystyle\dot{x}over˙ start_ARG italic_x end_ARG =Fq=f(x)g(x)2rq,absentsubscript𝐹𝑞𝑓𝑥𝑔superscript𝑥2𝑟𝑞\displaystyle=F_{q}=f(x)-\frac{g(x)^{2}}{r}q,= italic_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = italic_f ( italic_x ) - divide start_ARG italic_g ( italic_x ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG italic_q , (18)
q˙˙𝑞\displaystyle\dot{q}over˙ start_ARG italic_q end_ARG =FxqFJ=lx+g(x)gxrq2fxq,absentsubscript𝐹𝑥𝑞subscript𝐹𝐽superscript𝑙𝑥𝑔𝑥superscript𝑔𝑥𝑟superscript𝑞2superscript𝑓𝑥𝑞\displaystyle=-F_{x}-qF_{J}=-l^{x}+\frac{g(x)g^{x}}{r}q^{2}-f^{x}q,= - italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - italic_q italic_F start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT = - italic_l start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + divide start_ARG italic_g ( italic_x ) italic_g start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_f start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_q , (19)

with the terminal conditions x(T)=xT,q(T)=cTx(xT)formulae-sequence𝑥𝑇subscript𝑥𝑇𝑞𝑇superscriptsubscript𝑐𝑇𝑥subscript𝑥𝑇x(T)=x_{T},\;q(T)=c_{T}^{x}(x_{T})italic_x ( italic_T ) = italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_q ( italic_T ) = italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ), where Fx=Fxsubscript𝐹𝑥𝐹𝑥F_{x}=\frac{\partial F}{\partial x}italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = divide start_ARG ∂ italic_F end_ARG start_ARG ∂ italic_x end_ARG, Fq=Fqsubscript𝐹𝑞𝐹𝑞F_{q}=\frac{\partial F}{\partial q}italic_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = divide start_ARG ∂ italic_F end_ARG start_ARG ∂ italic_q end_ARG, gx=gxsuperscript𝑔𝑥𝑔𝑥g^{x}=\frac{\partial g}{\partial x}italic_g start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG ∂ italic_g end_ARG start_ARG ∂ italic_x end_ARG, lx=lxsuperscript𝑙𝑥𝑙𝑥l^{x}=\frac{\partial l}{\partial x}italic_l start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG ∂ italic_l end_ARG start_ARG ∂ italic_x end_ARG, fx=fxsuperscript𝑓𝑥𝑓𝑥f^{x}=\frac{\partial f}{\partial x}italic_f start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG ∂ italic_f end_ARG start_ARG ∂ italic_x end_ARG and cTx=cTxsuperscriptsubscript𝑐𝑇𝑥subscript𝑐𝑇𝑥c_{T}^{x}=\frac{\partial c_{T}}{\partial x}italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG ∂ italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x end_ARG.
Given a terminal condition xTsubscript𝑥𝑇x_{T}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, the equations above can be integrated back in time to yield a characteristic curve of the HJB PDE. Now, we show how one can use these equations to get a perturbation solution of the HJB, and in particular, the linear feedback gain Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT corresponding to the optimal policy. The development also shows that the solution has a perturbation structure in that higher order terms do not affect the lower order terms.

Suppose now that one is given an optimal nominal trajectory x¯tsubscript¯𝑥𝑡\bar{x}_{t}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ] for a given initial condition x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, from solving the open-loop optimal control problem. Let the nominal terminal state be x¯Tsubscript¯𝑥𝑇\bar{x}_{T}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. We now expand the HJB solution around this nominal optimal solution. To this purpose, let xt=x¯t+δxtsubscript𝑥𝑡subscript¯𝑥𝑡𝛿subscript𝑥𝑡x_{t}=\bar{x}_{t}+\delta x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, for t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ]. Then, expanding the optimal cost function around the nominal yields: J(t,xt)=J¯t+Gtδxt+12Ptδxt2+16Stδxt3+,𝐽𝑡subscript𝑥𝑡subscript¯𝐽𝑡subscript𝐺𝑡𝛿subscript𝑥𝑡12subscript𝑃𝑡𝛿superscriptsubscript𝑥𝑡216subscript𝑆𝑡𝛿superscriptsubscript𝑥𝑡3J(t,x_{t})=\bar{J}_{t}+G_{t}\delta x_{t}+\frac{1}{2}P_{t}\delta x_{t}^{2}+% \frac{1}{6}S_{t}\delta x_{t}^{3}+\cdots,italic_J ( italic_t , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = over¯ start_ARG italic_J end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 6 end_ARG italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + ⋯ , where J¯t=J(t,x¯t),Gt=Jxt|x¯t,Pt=2Jxt2|x¯tformulae-sequencesubscript¯𝐽𝑡𝐽𝑡subscript¯𝑥𝑡formulae-sequencesubscript𝐺𝑡evaluated-at𝐽subscript𝑥𝑡subscript¯𝑥𝑡subscript𝑃𝑡evaluated-atsuperscript2𝐽superscriptsubscript𝑥𝑡2subscript¯𝑥𝑡\bar{J}_{t}=J(t,\bar{x}_{t}),G_{t}=\frac{\partial J}{\partial x_{t}}|_{\bar{x}% _{t}},P_{t}=\frac{\partial^{2}J}{\partial x_{t}^{2}}|_{\bar{x}_{t}}over¯ start_ARG italic_J end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_J ( italic_t , over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT, St=3Jxt3|x¯tsubscript𝑆𝑡evaluated-atsuperscript3𝐽superscriptsubscript𝑥𝑡3subscript¯𝑥𝑡S_{t}=\frac{\partial^{3}J}{\partial x_{t}^{3}}|_{\bar{x}_{t}}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG ∂ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_J end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Then, the co-state q=Jxt=Gt+Ptδxt+12Stδxt2+𝑞𝐽subscript𝑥𝑡subscript𝐺𝑡subscript𝑃𝑡𝛿subscript𝑥𝑡12subscript𝑆𝑡𝛿subscriptsuperscript𝑥2𝑡q=\frac{\partial J}{\partial x_{t}}=G_{t}+P_{t}\delta x_{t}+\frac{1}{2}S_{t}% \delta x^{2}_{t}+\cdotsitalic_q = divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG = italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ⋯.

For simplicity, we assume that gx=0superscript𝑔𝑥0g^{x}=0italic_g start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = 0 (this is relaxed but at the expense of a rather tedious derivation detailed in the Appendix). Hence,

ddt(x¯t+δxt)x¯˙t+δx˙t=subscript𝑑𝑑𝑡subscript¯𝑥𝑡𝛿subscript𝑥𝑡subscript˙¯𝑥𝑡subscript˙𝛿𝑥𝑡absent\displaystyle\underbrace{\frac{d}{dt}(\bar{x}_{t}+\delta x_{t})}_{\dot{\bar{x}% }_{t}+\dot{\delta x}_{t}}=under⏟ start_ARG divide start_ARG italic_d end_ARG start_ARG italic_d italic_t end_ARG ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT over˙ start_ARG over¯ start_ARG italic_x end_ARG end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over˙ start_ARG italic_δ italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = f(x¯t+δxt)(f¯t+f¯txδxt+12f¯txxδxt2+O(δxt3))subscript𝑓subscript¯𝑥𝑡𝛿subscript𝑥𝑡subscript¯𝑓𝑡superscriptsubscript¯𝑓𝑡𝑥𝛿subscript𝑥𝑡12superscriptsubscript¯𝑓𝑡𝑥𝑥𝛿subscriptsuperscript𝑥2𝑡𝑂𝛿superscriptsubscript𝑥𝑡3\displaystyle\underbrace{f(\bar{x}_{t}+\delta x_{t})}_{(\bar{f}_{t}+\bar{f}_{t% }^{x}\delta x_{t}+\frac{1}{2}\bar{f}_{t}^{xx}\delta x^{2}_{t}+O(\delta x_{t}^{% 3}))}under⏟ start_ARG italic_f ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_O ( italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) ) end_POSTSUBSCRIPT
g2r(Gt+Ptδxt+12Stδxt2+O(δxt3)),superscript𝑔2𝑟subscript𝐺𝑡subscript𝑃𝑡𝛿subscript𝑥𝑡12subscript𝑆𝑡𝛿subscriptsuperscript𝑥2𝑡𝑂𝛿superscriptsubscript𝑥𝑡3\displaystyle-\frac{g^{2}}{r}(G_{t}+P_{t}\delta x_{t}+\frac{1}{2}S_{t}\delta x% ^{2}_{t}+O(\delta x_{t}^{3})),- divide start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG ( italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_O ( italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) ) ,

where f¯t=f(x¯t),f¯tx=fxt|x¯tformulae-sequencesubscript¯𝑓𝑡𝑓subscript¯𝑥𝑡superscriptsubscript¯𝑓𝑡𝑥evaluated-at𝑓subscript𝑥𝑡subscript¯𝑥𝑡\bar{f}_{t}=f(\bar{x}_{t}),\bar{f}_{t}^{x}=\frac{\partial f}{\partial x_{t}}|_% {\bar{x}_{t}}over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_f ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = divide start_ARG ∂ italic_f end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Expanding in powers of the perturbation variable δxt𝛿subscript𝑥𝑡\delta x_{t}italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the equation above can be written as (after noting that x¯˙t=f¯tg2rGtsubscript˙¯𝑥𝑡subscript¯𝑓𝑡superscript𝑔2𝑟subscript𝐺𝑡\dot{\bar{x}}_{t}=\bar{f}_{t}-\frac{g^{2}}{r}G_{t}over˙ start_ARG over¯ start_ARG italic_x end_ARG end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - divide start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT due to the nominal trajectory x¯tsubscript¯𝑥𝑡\bar{x}_{t}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT satisfying the characteristic equation):

δx˙t=(f¯txg2rPt)δxt+12(f¯txxg2rSt)δxt2+O(δxt3).subscript˙𝛿𝑥𝑡superscriptsubscript¯𝑓𝑡𝑥superscript𝑔2𝑟subscript𝑃𝑡𝛿subscript𝑥𝑡12superscriptsubscript¯𝑓𝑡𝑥𝑥superscript𝑔2𝑟subscript𝑆𝑡𝛿subscriptsuperscript𝑥2𝑡𝑂𝛿superscriptsubscript𝑥𝑡3\displaystyle\dot{\delta x}_{t}=(\bar{f}_{t}^{x}-\frac{g^{2}}{r}P_{t})\delta x% _{t}+\frac{1}{2}(\bar{f}_{t}^{xx}-\frac{g^{2}}{r}S_{t})\delta x^{2}_{t}+O(% \delta x_{t}^{3}).over˙ start_ARG italic_δ italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - divide start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT - divide start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_O ( italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) . (20)

Next, we have: dqdt=lxfxq𝑑𝑞𝑑𝑡superscript𝑙𝑥superscript𝑓𝑥𝑞\frac{dq}{dt}=-l^{x}-f^{x}qdivide start_ARG italic_d italic_q end_ARG start_ARG italic_d italic_t end_ARG = - italic_l start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - italic_f start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_q

ddt(Gt+Ptδxt+12Stδxt2+O(δx3))=(l¯tx+l¯txxδxt\displaystyle\frac{d}{dt}(G_{t}+P_{t}\delta x_{t}+\frac{1}{2}S_{t}\delta x^{2}% _{t}+O(\delta x^{3}))=-(\bar{l}_{t}^{x}+\bar{l}_{t}^{xx}\delta x_{t}divide start_ARG italic_d end_ARG start_ARG italic_d italic_t end_ARG ( italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_O ( italic_δ italic_x start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) ) = - ( over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
+12l¯txxxδxt2+O(δx3))(f¯tx+f¯txxδxt+12f¯txxxδxt2\displaystyle+\frac{1}{2}\bar{l}_{t}^{xxx}\delta x^{2}_{t}+O(\delta x^{3}))-% \Big{(}\bar{f}_{t}^{x}+\bar{f}_{t}^{xx}\delta x_{t}+\frac{1}{2}\bar{f}_{t}^{% xxx}\delta x^{2}_{t}+ divide start_ARG 1 end_ARG start_ARG 2 end_ARG over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x italic_x end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_O ( italic_δ italic_x start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) ) - ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x italic_x end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
+O(δx3))(Gt+Ptδxt+12Stδxt2+O(δx3)),\displaystyle+O(\delta x^{3})\Big{)}(G_{t}+P_{t}\delta x_{t}+\frac{1}{2}S_{t}% \delta x^{2}_{t}+O(\delta x^{3})),+ italic_O ( italic_δ italic_x start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) ) ( italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_O ( italic_δ italic_x start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) ) , (21)

where f¯txx=2fx2|x¯t,f¯txxx=3fx3|x¯t,l¯tx=lx|x¯t,l¯txx=2lx2|x¯t,l¯txxx=3lx3|x¯tformulae-sequencesuperscriptsubscript¯𝑓𝑡𝑥𝑥evaluated-atsuperscript2𝑓superscript𝑥2subscript¯𝑥𝑡formulae-sequencesuperscriptsubscript¯𝑓𝑡𝑥𝑥𝑥evaluated-atsuperscript3𝑓superscript𝑥3subscript¯𝑥𝑡formulae-sequencesubscriptsuperscript¯𝑙𝑥𝑡evaluated-at𝑙𝑥subscript¯𝑥𝑡formulae-sequencesubscriptsuperscript¯𝑙𝑥𝑥𝑡evaluated-atsuperscript2𝑙superscript𝑥2subscript¯𝑥𝑡subscriptsuperscript¯𝑙𝑥𝑥𝑥𝑡evaluated-atsuperscript3𝑙superscript𝑥3subscript¯𝑥𝑡\bar{f}_{t}^{xx}=\frac{\partial^{2}f}{\partial x^{2}}|_{\bar{x}_{t}},\bar{f}_{% t}^{xxx}=\frac{\partial^{3}f}{\partial x^{3}}|_{\bar{x}_{t}},\bar{l}^{x}_{t}=% \frac{\partial l}{\partial x}|_{\bar{x}_{t}},\bar{l}^{xx}_{t}=\frac{\partial^{% 2}l}{\partial x^{2}}|_{\bar{x}_{t}},\bar{l}^{xxx}_{t}=\frac{\partial^{3}l}{% \partial x^{3}}|_{\bar{x}_{t}}over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT = divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x italic_x end_POSTSUPERSCRIPT = divide start_ARG ∂ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_f end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over¯ start_ARG italic_l end_ARG start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG ∂ italic_l end_ARG start_ARG ∂ italic_x end_ARG | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over¯ start_ARG italic_l end_ARG start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_l end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over¯ start_ARG italic_l end_ARG start_POSTSUPERSCRIPT italic_x italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG ∂ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_l end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Using ddtPtδxt=P˙tδxt+Ptδx˙t𝑑𝑑𝑡subscript𝑃𝑡𝛿subscript𝑥𝑡subscript˙𝑃𝑡𝛿subscript𝑥𝑡subscript𝑃𝑡subscript˙𝛿𝑥𝑡\frac{d}{dt}P_{t}\delta x_{t}=\dot{P}_{t}\delta x_{t}+P_{t}\dot{\delta x}_{t}divide start_ARG italic_d end_ARG start_ARG italic_d italic_t end_ARG italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over˙ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over˙ start_ARG italic_δ italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, ddtStδxt2=S˙tδxt2+2Stδxδx˙t𝑑𝑑𝑡subscript𝑆𝑡𝛿subscriptsuperscript𝑥2𝑡subscript˙𝑆𝑡𝛿subscriptsuperscript𝑥2𝑡2subscript𝑆𝑡𝛿𝑥subscript˙𝛿𝑥𝑡\frac{d}{dt}S_{t}\delta x^{2}_{t}=\dot{S}_{t}\delta x^{2}_{t}+2S_{t}\delta x% \dot{\delta x}_{t}divide start_ARG italic_d end_ARG start_ARG italic_d italic_t end_ARG italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over˙ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + 2 italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x over˙ start_ARG italic_δ italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and substituting for δx˙tsubscript˙𝛿𝑥𝑡\dot{\delta x}_{t}over˙ start_ARG italic_δ italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from (20), and expanding the two sides above in powers of δxt𝛿subscript𝑥𝑡\delta x_{t}italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT yields:

G˙t+(P˙t+Pt(f¯txg2rPt))δxt+12(Pt(f¯txxg2rSt)\displaystyle\dot{G}_{t}+(\dot{P}_{t}+P_{t}(\bar{f}_{t}^{x}-\frac{g^{2}}{r}P_{% t}))\delta x_{t}+\frac{1}{2}\Big{(}P_{t}(\bar{f}_{t}^{xx}-\frac{g^{2}}{r}S_{t})over˙ start_ARG italic_G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( over˙ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - divide start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT - divide start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
+S˙t+2St(f¯txg2rPt))δx2+O(δx3)\displaystyle+\dot{S}_{t}+2S_{t}(\bar{f}_{t}^{x}-\frac{g^{2}}{r}P_{t})\Big{)}% \delta x^{2}+O(\delta x^{3})+ over˙ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + 2 italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - divide start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_O ( italic_δ italic_x start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT )
=(l¯tx+f¯txGt)(l¯txx+f¯txPt+f¯txxGt)δxtabsentsuperscriptsubscript¯𝑙𝑡𝑥superscriptsubscript¯𝑓𝑡𝑥subscript𝐺𝑡superscriptsubscript¯𝑙𝑡𝑥𝑥superscriptsubscript¯𝑓𝑡𝑥subscript𝑃𝑡superscriptsubscript¯𝑓𝑡𝑥𝑥subscript𝐺𝑡𝛿subscript𝑥𝑡\displaystyle\quad=-(\bar{l}_{t}^{x}+\bar{f}_{t}^{x}G_{t})-(\bar{l}_{t}^{xx}+% \bar{f}_{t}^{x}P_{t}+\bar{f}_{t}^{xx}G_{t})\delta x_{t}= - ( over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - ( over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
12(l¯txxx+f¯txxxGt+2f¯txxPt+f¯txSt)δx2+O(δx3).12superscriptsubscript¯𝑙𝑡𝑥𝑥𝑥superscriptsubscript¯𝑓𝑡𝑥𝑥𝑥subscript𝐺𝑡2superscriptsubscript¯𝑓𝑡𝑥𝑥subscript𝑃𝑡superscriptsubscript¯𝑓𝑡𝑥subscript𝑆𝑡𝛿superscript𝑥2𝑂𝛿superscript𝑥3\displaystyle-\frac{1}{2}(\bar{l}_{t}^{xxx}+\bar{f}_{t}^{xxx}G_{t}+2\bar{f}_{t% }^{xx}P_{t}+\bar{f}_{t}^{x}S_{t})\delta x^{2}+O(\delta x^{3}).- divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x italic_x end_POSTSUPERSCRIPT + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x italic_x end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + 2 over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_O ( italic_δ italic_x start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) .

Equating the first three powers of δxt𝛿subscript𝑥𝑡\delta x_{t}italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT yields:

G˙t+l¯tx+f¯txGt=0,subscript˙𝐺𝑡superscriptsubscript¯𝑙𝑡𝑥superscriptsubscript¯𝑓𝑡𝑥subscript𝐺𝑡0\displaystyle\dot{G}_{t}+\bar{l}_{t}^{x}+\bar{f}_{t}^{x}G_{t}=0,over˙ start_ARG italic_G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 , (22)
P˙t+l¯txx+Ptf¯tx+f¯txPtPtg2rPt+f¯txxGt=0,subscript˙𝑃𝑡superscriptsubscript¯𝑙𝑡𝑥𝑥subscript𝑃𝑡superscriptsubscript¯𝑓𝑡𝑥superscriptsubscript¯𝑓𝑡𝑥subscript𝑃𝑡subscript𝑃𝑡superscript𝑔2𝑟subscript𝑃𝑡superscriptsubscript¯𝑓𝑡𝑥𝑥subscript𝐺𝑡0\displaystyle\dot{P}_{t}+\bar{l}_{t}^{xx}+P_{t}\bar{f}_{t}^{x}+\bar{f}_{t}^{x}% P_{t}-P_{t}\frac{g^{2}}{r}P_{t}+\bar{f}_{t}^{xx}G_{t}=0,over˙ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT + italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT divide start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 , (23)
S˙t+l¯txxx+Ptf¯txx+2f¯txxPt+f¯txSt+2Stf¯txPtg2rStsubscript˙𝑆𝑡superscriptsubscript¯𝑙𝑡𝑥𝑥𝑥subscript𝑃𝑡superscriptsubscript¯𝑓𝑡𝑥𝑥2superscriptsubscript¯𝑓𝑡𝑥𝑥subscript𝑃𝑡superscriptsubscript¯𝑓𝑡𝑥subscript𝑆𝑡2subscript𝑆𝑡superscriptsubscript¯𝑓𝑡𝑥subscript𝑃𝑡superscript𝑔2𝑟subscript𝑆𝑡\displaystyle\dot{S}_{t}+\bar{l}_{t}^{xxx}+P_{t}\bar{f}_{t}^{xx}+2\bar{f}_{t}^% {xx}P_{t}+\bar{f}_{t}^{x}S_{t}+2S_{t}\bar{f}_{t}^{x}-P_{t}\frac{g^{2}}{r}S_{t}over˙ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x italic_x end_POSTSUPERSCRIPT + italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT + 2 over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + 2 italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT divide start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
2Stg2rPt+f¯txxxGt=02subscript𝑆𝑡superscript𝑔2𝑟subscript𝑃𝑡superscriptsubscript¯𝑓𝑡𝑥𝑥𝑥subscript𝐺𝑡0\displaystyle-2S_{t}\frac{g^{2}}{r}P_{t}+\bar{f}_{t}^{xxx}G_{t}=0- 2 italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT divide start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x italic_x end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 (24)

Using the first-order necessary condition u(t,xt)=grJx𝑢𝑡subscript𝑥𝑡𝑔𝑟superscript𝐽𝑥u(t,x_{t})=-\frac{g}{r}J^{x}italic_u ( italic_t , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = - divide start_ARG italic_g end_ARG start_ARG italic_r end_ARG italic_J start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT, the optimal feedback law is given by:

u(t,xt)𝑢𝑡subscript𝑥𝑡\displaystyle u(t,x_{t})italic_u ( italic_t , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) =grJx=grGtu¯tgrPtKtδxtg2rStKt(2)δxt2+O(δxt3)absent𝑔𝑟superscript𝐽𝑥subscript𝑔𝑟subscript𝐺𝑡subscript¯𝑢𝑡subscript𝑔𝑟subscript𝑃𝑡subscript𝐾𝑡𝛿subscript𝑥𝑡subscript𝑔2𝑟subscript𝑆𝑡subscriptsuperscript𝐾2𝑡𝛿subscriptsuperscript𝑥2𝑡𝑂𝛿superscriptsubscript𝑥𝑡3\displaystyle=-\frac{g}{r}J^{x}=\underbrace{-\frac{g}{r}G_{t}}_{\bar{u}_{t}}% \underbrace{-\frac{g}{r}P_{t}}_{K_{t}}\delta x_{t}\underbrace{-\frac{g}{2r}S_{% t}}_{K^{(2)}_{t}}\delta x^{2}_{t}+O(\delta x_{t}^{3})= - divide start_ARG italic_g end_ARG start_ARG italic_r end_ARG italic_J start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = under⏟ start_ARG - divide start_ARG italic_g end_ARG start_ARG italic_r end_ARG italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT under⏟ start_ARG - divide start_ARG italic_g end_ARG start_ARG italic_r end_ARG italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT under⏟ start_ARG - divide start_ARG italic_g end_ARG start_ARG 2 italic_r end_ARG italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT italic_K start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_O ( italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) (25)
u(t,xt)𝑢𝑡subscript𝑥𝑡\displaystyle u(t,x_{t})italic_u ( italic_t , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) =u¯t+Ktδxt+Kt(2)δxt2+O(δxt3).absentsubscript¯𝑢𝑡subscript𝐾𝑡𝛿subscript𝑥𝑡subscriptsuperscript𝐾2𝑡𝛿subscriptsuperscript𝑥2𝑡𝑂𝛿superscriptsubscript𝑥𝑡3\displaystyle=\bar{u}_{t}+K_{t}\delta x_{t}+K^{(2)}_{t}\delta x^{2}_{t}+O(% \delta x_{t}^{3}).= over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_K start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_O ( italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) .

Thus, we see that the optimal feedback law has a perturbation structure in that the second-order terms Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT do not affect the first-order terms Gtsubscript𝐺𝑡G_{t}italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and the third and higher-order terms, Stsubscript𝑆𝑡S_{t}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT etc., do not affect the second-order term Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and so on for the third and higher order terms.
Now, we provide the final result for the general vector case with a state-dependent control influence matrix (please see the Appendix for details). We ignore the O(δxt2)𝑂𝛿superscriptsubscript𝑥𝑡2O(\delta x_{t}^{2})italic_O ( italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) and higher-order terms in the feedback law purely for notational convenience.

Definition 1

Let the control influence matrix be given as: 𝒢(x)=[g11(x)g1p(x)gn1(x)gnp(x)]=[Γ1(x)Γp(x)]𝒢𝑥matrixsuperscriptsubscript𝑔11𝑥superscriptsubscript𝑔1𝑝𝑥superscriptsubscript𝑔𝑛1𝑥superscriptsubscript𝑔𝑛𝑝𝑥matrixsuperscriptΓ1𝑥superscriptΓ𝑝𝑥\mathcal{G}(x)=\begin{bmatrix}g_{1}^{1}(x)\cdots g_{1}^{p}(x)\\ \ddots\\ g_{n}^{1}(x)\cdots g_{n}^{p}(x)\end{bmatrix}=\begin{bmatrix}\Gamma^{1}(x)% \cdots\Gamma^{p}(x)\end{bmatrix}caligraphic_G ( italic_x ) = [ start_ARG start_ROW start_CELL italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_x ) ⋯ italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_x ) end_CELL end_ROW start_ROW start_CELL ⋱ end_CELL end_ROW start_ROW start_CELL italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_x ) ⋯ italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_x ) end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL roman_Γ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_x ) ⋯ roman_Γ start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_x ) end_CELL end_ROW end_ARG ], i.e., ΓjsuperscriptΓ𝑗\Gamma^{j}roman_Γ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT represents the control influence vector corresponding to the jthsuperscript𝑗𝑡j^{th}italic_j start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT input. Let 𝒢¯t=𝒢(x¯t)subscript¯𝒢𝑡𝒢subscript¯𝑥𝑡\bar{\mathcal{G}}_{t}=\mathcal{G}(\bar{x}_{t})over¯ start_ARG caligraphic_G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = caligraphic_G ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), where {x¯t}subscript¯𝑥𝑡\{\bar{x}_{t}\}{ over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } represents the optimal nominal trajectory. Further, let =[f1(x)fn(x)]superscriptmatrixsubscript𝑓1𝑥subscript𝑓𝑛𝑥\mathcal{F}=\begin{bmatrix}f_{1}(x)\cdots f_{n}(x)\end{bmatrix}^{\intercal}caligraphic_F = [ start_ARG start_ROW start_CELL italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) ⋯ italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT denote the drift of the system. Let Gt=[Gt1Gtn]subscript𝐺𝑡superscriptdelimited-[]superscriptsubscript𝐺𝑡1superscriptsubscript𝐺𝑡𝑛G_{t}=[G_{t}^{1}\cdots G_{t}^{n}]^{\intercal}italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ⋯ italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT, and R1𝒢¯tGt=[u¯t1u¯tp]superscript𝑅1superscriptsubscript¯𝒢𝑡subscript𝐺𝑡superscriptdelimited-[]superscriptsubscript¯𝑢𝑡1superscriptsubscript¯𝑢𝑡𝑝R^{-1}\bar{\mathcal{G}}_{t}^{\intercal}G_{t}=-[\bar{u}_{t}^{1}\cdots\bar{u}_{t% }^{p}]^{\intercal}italic_R start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - [ over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ⋯ over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT, denote the optimal nominal co-state and control vectors respectively. Let the Jacobian and Hessian of our system matrices be defined as:

¯txsuperscriptsubscript¯𝑡𝑥\displaystyle\bar{\mathcal{F}}_{t}^{x}over¯ start_ARG caligraphic_F end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT =[f1x1f1xnfnx1fnxn]|x¯t,¯txx,i=[2f1x1xi2f1xnxi2fnx1xi2fnxnxi]|x¯t,formulae-sequenceabsentevaluated-atmatrixsubscript𝑓1subscript𝑥1subscript𝑓1subscript𝑥𝑛subscript𝑓𝑛subscript𝑥1subscript𝑓𝑛subscript𝑥𝑛subscript¯𝑥𝑡superscriptsubscript¯𝑡𝑥𝑥𝑖evaluated-atmatrixsuperscript2subscript𝑓1subscript𝑥1subscript𝑥𝑖superscript2subscript𝑓1subscript𝑥𝑛subscript𝑥𝑖superscript2subscript𝑓𝑛subscript𝑥1subscript𝑥𝑖superscript2subscript𝑓𝑛subscript𝑥𝑛subscript𝑥𝑖subscript¯𝑥𝑡\displaystyle=\begin{bmatrix}\frac{\partial f_{1}}{\partial x_{1}}\cdots\frac{% \partial f_{1}}{\partial x_{n}}\\ \ddots\\ \frac{\partial f_{n}}{\partial x_{1}}\cdots\frac{\partial f_{n}}{\partial x_{n% }}\end{bmatrix}|_{\bar{x}_{t}},~{}~{}\bar{\mathcal{F}}_{t}^{xx,i}=\begin{% bmatrix}\frac{\partial^{2}f_{1}}{\partial x_{1}\partial x_{i}}\cdots\frac{% \partial^{2}f_{1}}{\partial x_{n}\partial x_{i}}\\ \ddots\\ \frac{\partial^{2}f_{n}}{\partial x_{1}\partial x_{i}}\cdots\frac{\partial^{2}% f_{n}}{\partial x_{n}\partial x_{i}}\end{bmatrix}|_{\bar{x}_{t}},= [ start_ARG start_ROW start_CELL divide start_ARG ∂ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ⋯ divide start_ARG ∂ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL ⋱ end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ⋯ divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_CELL end_ROW end_ARG ] | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over¯ start_ARG caligraphic_F end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x , italic_i end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ⋯ divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL ⋱ end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ⋯ divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL end_ROW end_ARG ] | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ,
𝒢¯tx,isuperscriptsubscript¯𝒢𝑡𝑥𝑖\displaystyle\bar{\mathcal{G}}_{t}^{x,i}over¯ start_ARG caligraphic_G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x , italic_i end_POSTSUPERSCRIPT =[g11xig1pxign1xignpxi]|x¯t.absentevaluated-atmatrixsuperscriptsubscript𝑔11subscript𝑥𝑖superscriptsubscript𝑔1𝑝subscript𝑥𝑖superscriptsubscript𝑔𝑛1subscript𝑥𝑖superscriptsubscript𝑔𝑛𝑝subscript𝑥𝑖subscript¯𝑥𝑡\displaystyle=\begin{bmatrix}\frac{\partial g_{1}^{1}}{\partial x_{i}}\cdots% \frac{\partial g_{1}^{p}}{\partial x_{i}}\\ \ddots\\ \frac{\partial g_{n}^{1}}{\partial x_{i}}\cdots\frac{\partial g_{n}^{p}}{% \partial x_{i}}\end{bmatrix}|_{\bar{x}_{t}}.= [ start_ARG start_ROW start_CELL divide start_ARG ∂ italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ⋯ divide start_ARG ∂ italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL ⋱ end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ⋯ divide start_ARG ∂ italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL end_ROW end_ARG ] | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT . (26)

Similarly Γ¯tj,x=xΓj|x¯tsubscriptsuperscript¯Γ𝑗𝑥𝑡evaluated-atsubscript𝑥superscriptΓ𝑗subscript¯𝑥𝑡\bar{\Gamma}^{j,x}_{t}=\nabla_{x}\Gamma^{j}|_{\bar{x}_{t}}over¯ start_ARG roman_Γ end_ARG start_POSTSUPERSCRIPT italic_j , italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT roman_Γ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT, Γ¯tj,xx,i=xxΓj|x¯tsubscriptsuperscript¯Γ𝑗𝑥𝑥𝑖𝑡evaluated-atsubscript𝑥𝑥superscriptΓ𝑗subscript¯𝑥𝑡\bar{\Gamma}^{j,xx,i}_{t}=\nabla_{xx}\Gamma^{j}|_{\bar{x}_{t}}over¯ start_ARG roman_Γ end_ARG start_POSTSUPERSCRIPT italic_j , italic_x italic_x , italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∇ start_POSTSUBSCRIPT italic_x italic_x end_POSTSUBSCRIPT roman_Γ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT for the vector function ΓjsuperscriptΓ𝑗\Gamma^{j}roman_Γ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT. Finally, define 𝒜t=¯tx+j=1pΓ¯tj,xu¯tjsubscript𝒜𝑡superscriptsubscript¯𝑡𝑥superscriptsubscript𝑗1𝑝superscriptsubscript¯Γ𝑡𝑗𝑥superscriptsubscript¯𝑢𝑡𝑗\mathcal{A}_{t}=\bar{\mathcal{F}}_{t}^{x}+\sum_{j=1}^{p}\bar{\Gamma}_{t}^{j,x}% \bar{u}_{t}^{j}caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over¯ start_ARG caligraphic_F end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_x end_POSTSUPERSCRIPT over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT, L¯tx=xl|x¯tsuperscriptsubscript¯𝐿𝑡𝑥evaluated-atsubscript𝑥𝑙subscript¯𝑥𝑡\bar{L}_{t}^{x}=\nabla_{x}l|_{\bar{x}_{t}}over¯ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = ∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_l | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and L¯txx=xx2l|x¯tsuperscriptsubscript¯𝐿𝑡𝑥𝑥evaluated-atsubscriptsuperscript2𝑥𝑥𝑙subscript¯𝑥𝑡\bar{L}_{t}^{xx}=\nabla^{2}_{xx}l|_{\bar{x}_{t}}over¯ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT = ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_x italic_x end_POSTSUBSCRIPT italic_l | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

Proposition 3

Under A1, and given the above definitions, the following result holds for the evolution of the co-state/ gradient vector Gtsubscript𝐺𝑡G_{t}italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and the Hessian matrix Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, of the optimal cost function Jt(xt)subscript𝐽𝑡subscript𝑥𝑡{J}_{t}(x_{t})italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), evaluated on the optimal nominal trajectory x¯t,t[0,T]subscript¯𝑥𝑡𝑡0𝑇\bar{x}_{t},t\in[0,T]over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ∈ [ 0 , italic_T ]:

G˙tsubscript˙𝐺𝑡\displaystyle\dot{G}_{t}over˙ start_ARG italic_G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT +L¯tx+𝒜tGt=0,superscriptsubscript¯𝐿𝑡𝑥superscriptsubscript𝒜𝑡subscript𝐺𝑡0\displaystyle+\bar{L}_{t}^{x}+\mathcal{A}_{t}^{\intercal}G_{t}=0,+ over¯ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 , (27)
P˙tsubscript˙𝑃𝑡\displaystyle\dot{P}_{t}over˙ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT +𝒜tPt+Pt𝒜t+L¯txxsuperscriptsubscript𝒜𝑡subscript𝑃𝑡subscript𝑃𝑡subscript𝒜𝑡superscriptsubscript¯𝐿𝑡𝑥𝑥\displaystyle+\mathcal{A}_{t}^{\intercal}P_{t}+P_{t}\mathcal{A}_{t}+\bar{L}_{t% }^{xx}+ caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over¯ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT
+i=1nsuperscriptsubscript𝑖1𝑛\displaystyle+\sum_{i=1}^{n}+ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [¯txx,i+j=1pΓ¯tj,xx,iu¯tj]GtiKtRKt=0,delimited-[]superscriptsubscript¯𝑡𝑥𝑥𝑖superscriptsubscript𝑗1𝑝superscriptsubscript¯Γ𝑡𝑗𝑥𝑥𝑖superscriptsubscript¯𝑢𝑡𝑗superscriptsubscript𝐺𝑡𝑖superscriptsubscript𝐾𝑡𝑅subscript𝐾𝑡0\displaystyle[\bar{\mathcal{F}}_{t}^{xx,i}+\sum_{j=1}^{p}\bar{\Gamma}_{t}^{j,% xx,i}\bar{u}_{t}^{j}]G_{t}^{i}-K_{t}^{\intercal}RK_{t}=0,[ over¯ start_ARG caligraphic_F end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x , italic_i end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_x italic_x , italic_i end_POSTSUPERSCRIPT over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ] italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT - italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_R italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 , (28)
Ktsubscript𝐾𝑡\displaystyle K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =R1[i=1n𝒢¯tx,i,Gti+𝒢¯tPt],absentsuperscript𝑅1delimited-[]superscriptsubscript𝑖1𝑛superscriptsubscript¯𝒢𝑡𝑥𝑖superscriptsubscript𝐺𝑡𝑖superscriptsubscript¯𝒢𝑡subscript𝑃𝑡\displaystyle=-R^{-1}[\sum_{i=1}^{n}\bar{\mathcal{G}}_{t}^{x,i,\intercal}G_{t}% ^{i}+\bar{\mathcal{G}}_{t}^{\intercal}P_{t}],= - italic_R start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x , italic_i , ⊺ end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + over¯ start_ARG caligraphic_G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] , (29)

with terminal conditions GT=xcT|x¯Tsubscript𝐺𝑇evaluated-atsubscript𝑥subscript𝑐𝑇subscript¯𝑥𝑇G_{T}=\nabla_{x}c_{T}|_{\bar{x}_{T}}italic_G start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = ∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and PT=xx2cT|x¯Tsubscript𝑃𝑇evaluated-atsuperscriptsubscript𝑥𝑥2subscript𝑐𝑇subscript¯𝑥𝑇P_{T}=\nabla_{xx}^{2}c_{T}|_{\bar{x}_{T}}italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = ∇ start_POSTSUBSCRIPT italic_x italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT | start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT and the control input with the optimal linear feedback is given by ut=u¯t+Ktδxtsubscript𝑢𝑡subscript¯𝑢𝑡subscript𝐾𝑡𝛿subscript𝑥𝑡u_{t}=\bar{u}_{t}+K_{t}\delta x_{t}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Remark 4

Not standard LQR.  The co-state equation (27) above is identical to the co-state equation in the Minimum Principle [bryson, Pontryagin]. However, the Hessian Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT equation (28) is Riccati-like with some important differences: note the extra second order terms due to ¯txx,isuperscriptsubscript¯𝑡𝑥𝑥𝑖\bar{\mathcal{F}}_{t}^{xx,i}over¯ start_ARG caligraphic_F end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x , italic_i end_POSTSUPERSCRIPT and Γ¯txx,isuperscriptsubscript¯Γ𝑡𝑥𝑥𝑖\bar{\Gamma}_{t}^{xx,i}over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x , italic_i end_POSTSUPERSCRIPT in the second line stemming from the nonlinear drift and input influence vectors and an extra term in the gain equation (29) coming from the state dependent influence matrix. These terms are not present in the LQR Riccati equation, and thus, it is clear that this cannot be a traditional perturbation feedback design [bryson, Ch. 6]. If the input influence matrix is independent of the state, the first term in the second line remains, and hence, it is still different from the LQR case.

Remark 5

Convexity and Global Minimum. Recall the Lagrange-Charpit equations for solving the HJB (18), (19). Given an unconstrained control, under standard smoothness assumptions on the involved functions, the characteristic curves governed by the equations in (x,q)𝑥𝑞(x,q)( italic_x , italic_q ) space are unique, and do not intersect. Therefore, the open-loop optimal trajectory, found by satisfying the Minimum Principle is also the unique global minimum even though the open-loop problem is non-convex. This observation is formalized in the following result.

Proposition 4

Global Optimality of open-loop solution. Let the cost functions l()𝑙l(\cdot)italic_l ( ⋅ ), cT()subscript𝑐𝑇c_{T}(\cdot)italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( ⋅ ), the drift f()𝑓f(\cdot)italic_f ( ⋅ ) and the input influence function g()𝑔g(\cdot)italic_g ( ⋅ ) be 𝒞2superscript𝒞2\mathcal{C}^{2}caligraphic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, i.e., twice continuously differentiable, and let a solution to (18)-(19) exist in [0,T]0𝑇[0,T][ 0 , italic_T ] for any terminal condition (xT,qT)subscript𝑥𝑇subscript𝑞𝑇(x_{T},q_{T})( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ). Under A1, an optimal trajectory that satisfies the Minimum Principle from a given initial state x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, is the unique global minimum of the open-loop problem starting at the initial state x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

III-D Loss of Perturbation Structure in Stochastic Control

Finally, we outline the loss of the perturbation structure in the stochastic problem. For the sake of simplicity, we only consider the scalar case in continuous time, however, even this case brings out the difficulty associated with stochastic control while the generalization to the vector case is relatively straightforward, albeit somewhat tedious.

Recall the stochastic HJB:

Jt=minu[H(x,u)]+ϵ222Jx2,𝐽𝑡subscript𝑢𝐻𝑥𝑢superscriptitalic-ϵ22superscript2𝐽superscript𝑥2-\frac{\partial J}{\partial t}=\min_{u}[H(x,u)]+{\frac{\epsilon^{2}}{2}\ }% \frac{\partial^{2}J}{\partial x^{2}},- divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_t end_ARG = roman_min start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT [ italic_H ( italic_x , italic_u ) ] + divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (30)

where H(x,u)=l(x)+12ru2+(f(x)+gu)Jx𝐻𝑥𝑢𝑙𝑥12𝑟superscript𝑢2𝑓𝑥𝑔𝑢𝐽𝑥H(x,u)=l(x)+\frac{1}{2}ru^{2}+(f(x)+gu)\frac{\partial J}{\partial x}italic_H ( italic_x , italic_u ) = italic_l ( italic_x ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_r italic_u start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_f ( italic_x ) + italic_g italic_u ) divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_x end_ARG is the Hamiltonian of the system, and the equation is integrated backwards from a terminal condition J(T,x)=cT(x)𝐽𝑇𝑥subscript𝑐𝑇𝑥J(T,x)=c_{T}(x)italic_J ( italic_T , italic_x ) = italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x ). For simplicity, we assume that g𝑔gitalic_g is not state dependent in the following derivation and we also assume the noise variance Q=1𝑄1Q=1italic_Q = 1, which otherwise would appear in the diffusion term in Eq. (30). Suppose now that we are given the optimal policy u(t,x)𝑢𝑡𝑥u(t,x)italic_u ( italic_t , italic_x ) and suppose that the nominal trajectory of the system (without noise) starting at some x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is given by {x¯t}subscript¯𝑥𝑡\{\bar{x}_{t}\}{ over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } under the nominal control {u¯t}subscript¯𝑢𝑡\{\bar{u}_{t}\}{ over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT }. As was done previously, let us now expand the solution of the equation above in terms of the perturbations from this nominal trajectory, δxt=xtx¯t𝛿subscript𝑥𝑡subscript𝑥𝑡subscript¯𝑥𝑡\delta x_{t}=x_{t}-\bar{x}_{t}italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Then, given the optimal nominal control u¯tsubscript¯𝑢𝑡\bar{u}_{t}over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we can solve the minimization of the Hamiltonian as:

minutsubscriptsubscript𝑢𝑡\displaystyle\min_{u_{t}}roman_min start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT H(xt,ut)=minδutH(x¯t+δxt,u¯t+δut),𝐻subscript𝑥𝑡subscript𝑢𝑡subscript𝛿subscript𝑢𝑡𝐻subscript¯𝑥𝑡𝛿subscript𝑥𝑡subscript¯𝑢𝑡𝛿subscript𝑢𝑡\displaystyle\ H(x_{t},u_{t})=\min_{\delta u_{t}}H(\bar{x}_{t}+\delta x_{t},% \bar{u}_{t}+\delta u_{t}),italic_H ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = roman_min start_POSTSUBSCRIPT italic_δ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_H ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_δ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , (31)
=minδut[l(x¯t+δxt)+r2u¯t2+(f(x¯t+δxt)+gu¯t)Jx\displaystyle=\min_{\delta u_{t}}\Big{[}l(\bar{x}_{t}+\delta x_{t})+\frac{r}{2% }\bar{u}_{t}^{2}+(f(\bar{x}_{t}+\delta x_{t})+g\bar{u}_{t})\frac{\partial J}{% \partial x}= roman_min start_POSTSUBSCRIPT italic_δ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_l ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + divide start_ARG italic_r end_ARG start_ARG 2 end_ARG over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_f ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_g over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_x end_ARG
+ru¯tδut+r2δut2+gδutJx],\displaystyle+r\bar{u}_{t}\delta u_{t}+\frac{r}{2}\delta u_{t}^{2}+g\delta u_{% t}\frac{\partial J}{\partial x}\Big{]},+ italic_r over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG italic_r end_ARG start_ARG 2 end_ARG italic_δ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_g italic_δ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_x end_ARG ] ,

which leads to the necessary condition for a minimum:

(gJx+ru¯t)+rδut=0,𝑔𝐽𝑥𝑟subscript¯𝑢𝑡𝑟𝛿subscript𝑢𝑡0\displaystyle(g\frac{\partial J}{\partial x}+r\bar{u}_{t})+r\delta u_{t}=0,( italic_g divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_x end_ARG + italic_r over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_r italic_δ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 , (32)

which is also sufficient for a minimum since r>0𝑟0r>0italic_r > 0 leading to H𝐻Hitalic_H being strictly quadratic in the variable δut𝛿subscript𝑢𝑡\delta u_{t}italic_δ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. From Eq. (32), the optimizing perturbation control is given by δut=u¯tgrJx𝛿subscript𝑢𝑡subscript¯𝑢𝑡𝑔𝑟𝐽𝑥\delta u_{t}=-\bar{u}_{t}-\frac{g}{r}\frac{\partial J}{\partial x}italic_δ italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - divide start_ARG italic_g end_ARG start_ARG italic_r end_ARG divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_x end_ARG.

Now, let us expand the dynamics and the optimal cost function in the HJB in terms of their perturbations from the nominal trajectory: f(xt)=f(x¯t)+Ft1δxt+12Ft2δxt2+𝑓subscript𝑥𝑡𝑓subscript¯𝑥𝑡superscriptsubscript𝐹𝑡1𝛿subscript𝑥𝑡12superscriptsubscript𝐹𝑡2𝛿superscriptsubscript𝑥𝑡2f(x_{t})=f(\bar{x}_{t})+F_{t}^{1}\delta x_{t}+\frac{1}{2}F_{t}^{2}\delta x_{t}% ^{2}+\cdotsitalic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_f ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯, J(t,xt)=J¯t(x¯t)+Kt1δxt+12Kt2δxt2+𝐽𝑡subscript𝑥𝑡subscript¯𝐽𝑡subscript¯𝑥𝑡superscriptsubscript𝐾𝑡1𝛿subscript𝑥𝑡12superscriptsubscript𝐾𝑡2𝛿superscriptsubscript𝑥𝑡2J(t,x_{t})=\bar{J}_{t}(\bar{x}_{t})+K_{t}^{1}\delta x_{t}+\frac{1}{2}K_{t}^{2}% \delta x_{t}^{2}+\cdotsitalic_J ( italic_t , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = over¯ start_ARG italic_J end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯, where the Fti,Ktisuperscriptsubscript𝐹𝑡𝑖superscriptsubscript𝐾𝑡𝑖F_{t}^{i},K_{t}^{i}italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT represent the Taylor coefficients of the series expansion of these functions. Therefore, Jx=Kt1+Kt2δxt+Kt32δxt2+𝐽𝑥superscriptsubscript𝐾𝑡1superscriptsubscript𝐾𝑡2𝛿subscript𝑥𝑡superscriptsubscript𝐾𝑡32𝛿superscriptsubscript𝑥𝑡2\frac{\partial J}{\partial x}=K_{t}^{1}+K_{t}^{2}\delta x_{t}+\frac{K_{t}^{3}}% {2}\delta x_{t}^{2}+\cdotsdivide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_x end_ARG = italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯, 2Jx2=Kt2+Kt3δxt+12Kt4δxt2+superscript2𝐽superscript𝑥2superscriptsubscript𝐾𝑡2superscriptsubscript𝐾𝑡3𝛿subscript𝑥𝑡12superscriptsubscript𝐾𝑡4𝛿superscriptsubscript𝑥𝑡2\frac{\partial^{2}J}{\partial x^{2}}=K_{t}^{2}+K_{t}^{3}\delta x_{t}+\frac{1}{% 2}K_{t}^{4}\delta x_{t}^{2}+\cdotsdivide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯. Noting that the variable xt=x¯t+δxtsubscript𝑥𝑡subscript¯𝑥𝑡𝛿subscript𝑥𝑡x_{t}=\bar{x}_{t}+\delta x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, i.e., the space variable has an explicit time dependence via the nominal trajectory, it follows that:

J(t,xt)t=[J¯˙t(x¯t)+K˙t1δxt+12K˙t2δxt2+]𝐽𝑡subscript𝑥𝑡𝑡delimited-[]subscript˙¯𝐽𝑡subscript¯𝑥𝑡superscriptsubscript˙𝐾𝑡1𝛿subscript𝑥𝑡12superscriptsubscript˙𝐾𝑡2𝛿superscriptsubscript𝑥𝑡2\displaystyle\frac{\partial J(t,x_{t})}{\partial t}=[\dot{\bar{J}}_{t}(\bar{x}% _{t})+\dot{K}_{t}^{1}\delta x_{t}+\frac{1}{2}\dot{K}_{t}^{2}\delta x_{t}^{2}+\cdots]divide start_ARG ∂ italic_J ( italic_t , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG ∂ italic_t end_ARG = [ over˙ start_ARG over¯ start_ARG italic_J end_ARG end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + over˙ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG over˙ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯ ]
x¯˙t[Kt1+Kt2δxt+Kt32δxt2+],subscript˙¯𝑥𝑡delimited-[]superscriptsubscript𝐾𝑡1superscriptsubscript𝐾𝑡2𝛿subscript𝑥𝑡superscriptsubscript𝐾𝑡32𝛿superscriptsubscript𝑥𝑡2\displaystyle-\dot{\bar{x}}_{t}[K_{t}^{1}+K_{t}^{2}\delta x_{t}+\frac{K_{t}^{3% }}{2}\delta x_{t}^{2}+\cdots],- over˙ start_ARG over¯ start_ARG italic_x end_ARG end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT [ italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯ ] , (33)

where, J¯˙t(x¯t),K˙t1,subscript˙¯𝐽𝑡subscript¯𝑥𝑡superscriptsubscript˙𝐾𝑡1\dot{\bar{J}}_{t}(\bar{x}_{t}),\dot{K}_{t}^{1},\cdotsover˙ start_ARG over¯ start_ARG italic_J end_ARG end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , over˙ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , ⋯, are total derivatives with respect to t𝑡titalic_t, since they only depend on the time.

Then, using the above expressions, one can express the minimum value of the Hamiltonian in terms of the state perturbations δxt𝛿subscript𝑥𝑡\delta x_{t}italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as:

minutH(xt,ut)subscriptsubscript𝑢𝑡𝐻subscript𝑥𝑡subscript𝑢𝑡\displaystyle\min_{u_{t}}H(x_{t},u_{t})roman_min start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_H ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) =[l(x¯t)+Lt1δxt+Lt22δxt2+]absentdelimited-[]𝑙subscript¯𝑥𝑡superscriptsubscript𝐿𝑡1𝛿subscript𝑥𝑡superscriptsubscript𝐿𝑡22𝛿superscriptsubscript𝑥𝑡2\displaystyle=[l(\bar{x}_{t})+L_{t}^{1}\delta x_{t}+\frac{L_{t}^{2}}{2}\delta x% _{t}^{2}+\cdots]= [ italic_l ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯ ]
g22r[Kt1+Kt2δxt+Kt32δxt2+]2superscript𝑔22𝑟superscriptdelimited-[]superscriptsubscript𝐾𝑡1superscriptsubscript𝐾𝑡2𝛿subscript𝑥𝑡superscriptsubscript𝐾𝑡32𝛿superscriptsubscript𝑥𝑡22\displaystyle-\frac{g^{2}}{2r}[K_{t}^{1}+K_{t}^{2}\delta x_{t}+\frac{K_{t}^{3}% }{2}\delta x_{t}^{2}+\cdots]^{2}- divide start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_r end_ARG [ italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯ ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+(f(x¯t)+Ft1δxt+Ft22δxt2+)(Kt1\displaystyle+(f(\bar{x}_{t})+F_{t}^{1}\delta x_{t}+\frac{F_{t}^{2}}{2}\delta x% _{t}^{2}+\cdots)(K_{t}^{1}+ ( italic_f ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯ ) ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT
+Kt2δxt+Kt32δxt2+).\displaystyle+K_{t}^{2}\delta x_{t}+\frac{K_{t}^{3}}{2}\delta x_{t}^{2}+\cdots).+ italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯ ) . (34)

Next, noting that x¯˙=f(x¯t)+gu¯t˙¯𝑥𝑓subscript¯𝑥𝑡𝑔subscript¯𝑢𝑡\dot{\bar{x}}=f(\bar{x}_{t})+g\bar{u}_{t}over˙ start_ARG over¯ start_ARG italic_x end_ARG end_ARG = italic_f ( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_g over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we obtain the following equations for the evolution of the Taylor co-efficient of the optimal cost function by equating the different powers of δxt𝛿subscript𝑥𝑡\delta x_{t}italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT on both sides of the stochastic HJB (Eq. (30)) given in Eq. (III-D) and (III-D).

J¯˙tsubscript˙¯𝐽𝑡\displaystyle-\dot{\bar{J}}_{t}- over˙ start_ARG over¯ start_ARG italic_J end_ARG end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =l¯tgr(gKt12+ru¯t)Kt1+ϵ2Kt2,absentsubscript¯𝑙𝑡𝑔𝑟𝑔superscriptsubscript𝐾𝑡12𝑟subscript¯𝑢𝑡superscriptsubscript𝐾𝑡1superscriptitalic-ϵ2superscriptsubscript𝐾𝑡2\displaystyle=\bar{l}_{t}-\frac{g}{r}(\frac{gK_{t}^{1}}{2}+r\bar{u}_{t})K_{t}^% {1}+\epsilon^{2}K_{t}^{2},= over¯ start_ARG italic_l end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - divide start_ARG italic_g end_ARG start_ARG italic_r end_ARG ( divide start_ARG italic_g italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG + italic_r over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (35)
K˙t1superscriptsubscript˙𝐾𝑡1\displaystyle-\dot{K}_{t}^{1}- over˙ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT =Lt1+Ft1Kt1gr(gKt1+ru¯t)Kt2+ϵ2Kt3,absentsuperscriptsubscript𝐿𝑡1superscriptsubscript𝐹𝑡1superscriptsubscript𝐾𝑡1𝑔𝑟𝑔superscriptsubscript𝐾𝑡1𝑟subscript¯𝑢𝑡superscriptsubscript𝐾𝑡2superscriptitalic-ϵ2superscriptsubscript𝐾𝑡3\displaystyle=L_{t}^{1}+F_{t}^{1}K_{t}^{1}-\frac{g}{r}(gK_{t}^{1}+r\bar{u}_{t}% )K_{t}^{2}+\epsilon^{2}K_{t}^{3},= italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT - divide start_ARG italic_g end_ARG start_ARG italic_r end_ARG ( italic_g italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + italic_r over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , (36)
K˙t2superscriptsubscript˙𝐾𝑡2\displaystyle-\dot{K}_{t}^{2}- over˙ start_ARG italic_K end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =Lt2+2Ft1Kt2+Ft2Kt1g2r(Kt2)2absentsuperscriptsubscript𝐿𝑡22superscriptsubscript𝐹𝑡1superscriptsubscript𝐾𝑡2superscriptsubscript𝐹𝑡2superscriptsubscript𝐾𝑡1superscript𝑔2𝑟superscriptsuperscriptsubscript𝐾𝑡22\displaystyle=L_{t}^{2}+2F_{t}^{1}K_{t}^{2}+F_{t}^{2}K_{t}^{1}-\frac{g^{2}}{r}% (K_{t}^{2})^{2}= italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT - divide start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
gr(gKt1+ru¯t)Kt3+ϵ2Kt4,𝑔𝑟𝑔superscriptsubscript𝐾𝑡1𝑟subscript¯𝑢𝑡superscriptsubscript𝐾𝑡3superscriptitalic-ϵ2superscriptsubscript𝐾𝑡4\displaystyle-\frac{g}{r}(gK_{t}^{1}+r\bar{u}_{t})K_{t}^{3}+\epsilon^{2}K_{t}^% {4},- divide start_ARG italic_g end_ARG start_ARG italic_r end_ARG ( italic_g italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + italic_r over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT , (37)

where we have expanded the first three terms of the expansion in the equations above, and similar expansions may be done for the higher order terms as well. At this point, we make the following remarks regarding the perturbation expansion above.

Remark 6

Computational Intractability of the Stochastic Problem. The equations above show that the lower order terms in the stochastic problem are affected by the higher order terms unlike in the deterministic case. Thus, in order to compute the stochastic law, we have to approximate to a high enough order to ensure accuracy in the solution, which in turn implies that the solution of the stochastic problem is very prone to errors. To see this, note that if we were to expand the solution to the nthsuperscript𝑛𝑡n^{th}italic_n start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT order, the Taylor co-efficient Ktnsuperscriptsubscript𝐾𝑡𝑛K_{t}^{n}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT would be affected by the coefficients Ktn+1superscriptsubscript𝐾𝑡𝑛1K_{t}^{n+1}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT and Ktn+2superscriptsubscript𝐾𝑡𝑛2K_{t}^{n+2}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 2 end_POSTSUPERSCRIPT, and therefore these higher order coefficients would need to be sufficiently small for the resulting solution to be accurate. However, if one approximates to a very high order n𝑛nitalic_n, quite apart from the obvious curse of dimensionality issue, the resulting system of equations becomes severely ill-conditioned, and consequently, highly sensitive to small errors in the data. Please see our related paper [RL_conv] for the relevant details on this aspect.

Remark 7

The Deterministic Problem. The expressions above also allow us to find the perturbation expansions for the deterministic problem. It is key to note that if the problem considered is deterministic then gKt1+ru¯t=0𝑔superscriptsubscript𝐾𝑡1𝑟subscript¯𝑢𝑡0gK_{t}^{1}+r\bar{u}_{t}=0italic_g italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + italic_r over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 due to the minimum principle and since ϵ=0italic-ϵ0\epsilon=0italic_ϵ = 0 in the deterministic problem, we obtain the expressions that we derived via the Method of Characteristics in the previous section. The Method of Characteristics is still necessary since it allows us to establish the uniqueness of the optimal nominal trajectory (x¯t,u¯t)subscript¯𝑥𝑡subscript¯𝑢𝑡(\bar{x}_{t},\bar{u}_{t})( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). Thus, the above development can be thought of as an alternative way to derive the perturbation expansion result. Furthermore, we can see that if we are required to derive the cost-to-go of the deterministic policy when applied to the stochastic system, albeit gKt1+ru¯t=0𝑔superscriptsubscript𝐾𝑡1𝑟subscript¯𝑢𝑡0gK_{t}^{1}+r\bar{u}_{t}=0italic_g italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + italic_r over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 due to optimality, nonetheless, there is coupling from the higher order terms due to stochasticity arising from the O(ϵ2)𝑂superscriptitalic-ϵ2O(\epsilon^{2})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) terms above, and thus, even this case is intractable to compute. However, since we are interested only in the deterministic feedback law, such a computation is unnecessary.

IV THE OPTIMALITY OF SHRINKING HORIZON MODEL PREDICTIVE CONTROL

In our developments till this point, we have shown that the deterministic feedback law is near-optimal with respect to the optimal stochastic law and that it has a perturbation structure that is lost in the stochastic problem. However, solving the deterministic DP problem is also subject to the Curse of Dimensionality. Nonetheless, owing to the perturbation structure, one can solve the deterministic problem locally (up to the linear feedback term), and then replan at fixed decision time epochs, assuming that the time between the decision epochs is small enough that the local feedback law remains valid in between the epochs. Thus, consider a Model Predictive type approach to solving the stochastic control problem. We outline the algorithmic procedure in Algorithm 1 to highlight that our advocated procedure is slightly different from the traditional MPC approach studied in the literature [Mayne_1, Mayne_2].

1 Given: initial state x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, time horizon T𝑇Titalic_T, cost c(x,u)=l(x)+12u𝖳Ru𝑐𝑥𝑢𝑙𝑥12superscript𝑢𝖳𝑅𝑢c(x,u)=l(x)+\frac{1}{2}{u}^{\mathsf{T}}Ruitalic_c ( italic_x , italic_u ) = italic_l ( italic_x ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_u start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_R italic_u, terminal cost cT(x)subscript𝑐𝑇𝑥c_{T}(x)italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x ), and decision epoch time ΔΔ\Deltaroman_Δ.
2 Set N=TΔ𝑁𝑇ΔN=\frac{T}{\Delta}italic_N = divide start_ARG italic_T end_ARG start_ARG roman_Δ end_ARG, xi=x0subscript𝑥𝑖subscript𝑥0x_{i}=x_{0}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.
3 while t<NΔ𝑡𝑁Δt<N\Deltaitalic_t < italic_N roman_Δ do
  1. 41.

        

    Solve the open-loop (noise free) optimal control problem for initial state xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, along with the

5       associated linear perturbation feedback, for the
6       horizon (NΔt𝑁Δ𝑡N\Delta-titalic_N roman_Δ - italic_t). Let the perturbation feedback
law be denoted by u(t,x)=u¯t+Ktδxt𝑢𝑡𝑥subscript¯𝑢𝑡subscript𝐾𝑡𝛿subscript𝑥𝑡u(t,x)=\bar{u}_{t}+K_{t}\delta x_{t}italic_u ( italic_t , italic_x ) = over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where δxt=xtx¯t𝛿subscript𝑥𝑡subscript𝑥𝑡subscript¯𝑥𝑡\delta x_{t}=x_{t}-\bar{x}_{t}italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and (x¯t,u¯t)subscript¯𝑥𝑡subscript¯𝑢𝑡(\bar{x}_{t},\bar{u}_{t})( over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is the optimal nominal trajectory.
  • 72.

        

    Apply the perturbation feedback law u(t,x)𝑢𝑡𝑥u(t,x)italic_u ( italic_t , italic_x ) till

  • time (t+Δ)𝑡Δ(t+\Delta)( italic_t + roman_Δ ) and observe the state xf=x(t+Δ)subscript𝑥𝑓subscript𝑥𝑡Δx_{f}=x_{(t+\Delta)}italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT ( italic_t + roman_Δ ) end_POSTSUBSCRIPT.
  • 83.

        

    set t=t+Δ𝑡𝑡Δt=t+\Deltaitalic_t = italic_t + roman_Δ, xi=xfsubscript𝑥𝑖subscript𝑥𝑓x_{i}=x_{f}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT.

  • 9 end while
    Algorithm 1 Shrinking Horizon MPC (MPC-SH)
    Remark 8

    In traditional MPC [Mayne_1, Mayne_2], the horizon N𝑁Nitalic_N to solve the open-loop problem is fixed. The setting is deterministic, and the necessity of replanning for the problem stems from the assumption that the actual problem horizon is infinite, and therefore, computationally intractable. In lieu, our problem horizon is finite, the repeated replanning takes place over progressively shorter horizons, and the need for replanning arises from the stochasticity of the problem. In particular, note that if the system were really deterministic, there would be no need for replanning.

    Theorem 1

    Near-Optimality of MPC-SH. The MPC feedback policy obtained from the application of the Shrinking Horizon MPC algorithm is near-optimal to O(ϵ4)𝑂superscriptitalic-ϵ4O(\epsilon^{4})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) to the optimal stochastic feedback policy for the stochastic system (2).

    Proof:

    We know that Jt0(x)=φt0(x)superscriptsubscript𝐽𝑡0𝑥superscriptsubscript𝜑𝑡0𝑥J_{t}^{0}(x)=\varphi_{t}^{0}(x)italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_x ) = italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_x ), and Jt1(x)=φt1(x)superscriptsubscript𝐽𝑡1𝑥superscriptsubscript𝜑𝑡1𝑥J_{t}^{1}(x)=\varphi_{t}^{1}(x)italic_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_x ) = italic_φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_x ) from Proposition 2, for all (t,x)𝑡𝑥(t,x)( italic_t , italic_x ). Owing to the uniqueness and global optimality of the open-loop from Proposition 4, it follows that the nominal control sequence, and the associated linear perturbation feedback law, found by the MPC procedure outlined above coincides locally with the optimal deterministic feedback law given any state x𝑥xitalic_x and any time t𝑡titalic_t. Therefore, the result follows.\hfill\blacksquare

    Note that the proof above also shows that the MPC-SH procedure furnishes the optimal deterministic feedback law which is stated in the following corollary.

    Corollary 1

    The MPC-SH algorithm furnishes the optimal deterministic feedback law given any initial condition.

    The result above establishes that repeatedly solving the deterministic optimal control problem from the current state at the decision making epochs results in a near-optimal stochastic policy. We examine two particularly important consequences in the following.

    Stochastic MPC

    A major computational bottleneck with stochastic MPC [Mayne_1], is that the MPC search needs to be over (time-varying) feedback policies rather than control sequences owing to the stochasticity of the problem, which leads to an intractable optimization for nonlinear systems. Because of this intractability, most of the work in stochastic MPC deal with linear systems using stochastic tube approach [smpc_mesbah2016, smpc_heirung2018], and some more recent work using generalized polynomial chaos (gPC) [kim2013generalised, FISHER2009polychaos]. Nonlinear stochastic MPC using gPC also typically solves over control sequences instead of feedback policies for traceability. However, as our results demonstrate, the MPC feedback law we propose (MPC-SH) is near-optimal to the fourth order. Further, as we have shown analytically in Section III-D and as will be seen from our simulation results in Section V, in practice, the solution of the stochastic DP problem is highly sensitive to noise, quite apart from the usual issue of dimensionality, and MPC-SH gives much better performance than the solution of the stochastic DP problem. A further important practical consequence of Theorem 1 is that we can get performance comparable to MPC, by wrap** the optimal linear feedback law around the nominal control sequence (ut=u¯t+Ktδxtsubscript𝑢𝑡subscript¯𝑢𝑡subscript𝐾𝑡𝛿subscript𝑥𝑡u_{t}=\bar{u}_{t}+K_{t}\delta x_{t}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT), and replanning the nominal sequence only when the deviation is large enough, similar to the event driven MPC philosophy [ETMPC1, ETMPC2].

    Reinforcement Learning

    The problems considered in reinforcement learning can be construed as one of finding the optimal feedback policy for a stochastic nonlinear dynamical system [bertsekas1]. Typically, this is done via simulations or rollouts of the dynamical system of interest, which allied with a suitable function approximator such as a (deep) neural net, yields a nonlinear feedback policy. However, these methods tend to be highly data intensive, slow to converge, and suffer from extremely high variance in the solution since they try to solve the DP equation [RL_conv]. This is a manifestation of the inherent curse of dimensionality in trying to solve the stochastic DP problem. Thus, in our opinion, albeit the DP equation is an excellent analytical tool to study the structure of the feedback problem, nonetheless, it is not the correct synthesis tool. In fact, it is much easier to repeatedly solve the open-loop problem as prescribed by MPC, i.e., solve for the characteristic curves of the DP problem. Of course, there remains the problem of whether we can solve the open-loop problem online. In our opinion, this is feasible today, when allied with efficient computational algorithms like iLQR [ILQG_tassa2012synthesis] that exploit the causal structure of optimal control problems, suitable high performance computing (HPC) modifications, and suitable randomization of the computations via rollouts that can help us very efficiently estimate the system parameters involved. In fact, this is the subject of the second part of this paper on data-based control [wang2022search].

    V Simulation Results

    This section will show evidence for theoretical results derived previously through simulations. In subsection V-A, the inaccuracy of the stochastic solution, as discussed in Remark 6, will be shown for a simple 1-D problem in comparison with the deterministic solution. The near-optimality of MPC-SH, which was theoretically shown to be the optimal deterministic solution in Theorem 1, will also be compared with the stochastic solution in a nonlinear problem. Further, we will show why it is intractable to solve the stochastic HJB accurately. In Subsection V-B, the performance of using the optimal linear perturbation feedback derived in Section III-C will be compared with MPC-SH on nonlinear robotic problems. The experiments shown in this section are carried out over 500 Monte Carlo simulations, and the performance statistics are computed from these simulations .

    V-A Deterministic vs. Stochastic policy

    In this section, we aim to show through simulations, that computing the optimal stochastic feedback law is subject to errors, as explained by the theory discussed previously in Sec. III-D. We show this by comparing the performance of the deterministic solution applied to the stochastic problem and the stochastic solution in a nonlinear problem. We consider the following problem:

    J(0,x(0))=𝐽0𝑥0absent\displaystyle J(0,x(0))=italic_J ( 0 , italic_x ( 0 ) ) =
    min{ut}𝔼[12(0T(qx(t)2+ru(t)2)𝑑t+qTx(T)2)]subscriptsubscript𝑢𝑡subscript𝔼absentdelimited-[]12superscriptsubscript0𝑇𝑞𝑥superscript𝑡2𝑟𝑢superscript𝑡2differential-d𝑡subscript𝑞𝑇𝑥superscript𝑇2\displaystyle\min_{\{u_{t}\}}\mathop{\mathbb{E}}_{\begin{subarray}{c}{}\end{% subarray}}\left[{\frac{1}{2}\left(\int_{0}^{T}(qx(t)^{2}+ru(t)^{2})dt+q_{T}x(T% )^{2}\right)}\right]roman_min start_POSTSUBSCRIPT { italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT start_ARG end_ARG end_POSTSUBSCRIPT [ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_q italic_x ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_r italic_u ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_d italic_t + italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_x ( italic_T ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ] (38a)
    s.t.dx=(f(x)+g(x)u)dt+ϵdw,givenx(0).s.t.𝑑𝑥𝑓𝑥𝑔𝑥𝑢𝑑𝑡italic-ϵ𝑑𝑤given𝑥0\displaystyle\text{s.t.}~{}dx=(f(x)+g(x)u)dt+\epsilon dw,~{}\text{given}~{}x(0).s.t. italic_d italic_x = ( italic_f ( italic_x ) + italic_g ( italic_x ) italic_u ) italic_d italic_t + italic_ϵ italic_d italic_w , given italic_x ( 0 ) . (38b)

    The solution to the above problem is calculated by solving the HJB equation (written for the scalar case):

    Jt=12(qx2g(x)2r(Jx)2)+f(x)Jx+ϵ2Q22Jx2,𝐽𝑡12𝑞superscript𝑥2𝑔superscript𝑥2𝑟superscript𝐽𝑥2𝑓𝑥𝐽𝑥superscriptitalic-ϵ2𝑄2superscript2𝐽superscript𝑥2-\frac{\partial J}{\partial t}=\frac{1}{2}\Big{(}qx^{2}-\frac{g(x)^{2}}{r}\Big% {(}\frac{\partial J}{\partial x}\Big{)}^{2}\Big{)}+f(x)\frac{\partial J}{% \partial x}+\frac{\epsilon^{2}Q}{2}\frac{\partial^{2}J}{\partial x^{2}},~{}- divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_t end_ARG = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_q italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_g ( italic_x ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG ( divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_x end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_f ( italic_x ) divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_x end_ARG + divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_Q end_ARG start_ARG 2 end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (39)

    where, J=J(t,x)𝐽𝐽𝑡𝑥J=J(t,x)italic_J = italic_J ( italic_t , italic_x ) is the expected cost-to-go from state x𝑥xitalic_x at time t𝑡titalic_t, with terminal condition J(T,x)=12qTx2𝐽𝑇𝑥12subscript𝑞𝑇superscript𝑥2J(T,x)=\frac{1}{2}q_{T}x^{2}italic_J ( italic_T , italic_x ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The minimizing optimal control u=1rg(x)Jx𝑢1𝑟𝑔𝑥𝐽𝑥u=-\frac{1}{r}g(x)\frac{\partial J}{\partial x}italic_u = - divide start_ARG 1 end_ARG start_ARG italic_r end_ARG italic_g ( italic_x ) divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_x end_ARG and we take q=100𝑞100q=100italic_q = 100, qT=500subscript𝑞𝑇500q_{T}=500italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = 500 and r=1𝑟1r=1italic_r = 1. The noise w𝑤witalic_w added to the system in stochastic cases is zero mean Gaussian white noise, with standard deviation being the maximum value of the control input obtained from the nominal trajectory by solving the deterministic problem - (Q=u¯max)𝑄subscript¯𝑢𝑚𝑎𝑥(\sqrt{Q}=\bar{u}_{max})( square-root start_ARG italic_Q end_ARG = over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT ). The HJB equation in Eq. (39) is solved by the finite difference (FD) method in a fixed domain since it is the standard method for solving advection-diffusion PDEs, which Eq. (39) is, in the computation fluid dynamics community [FD_CFD]. The parameters used in FD are shown in Table I. The time and space discretization was chosen to satisfy the Courant–Friedrichs–Lewy (CFL) conditions [CFL]. We consider only a 1-D problem for the sake of easy illustration since Eq. (39) becomes computationally intractable to solve for high-dimensional problems; nevertheless, these simple low dimensional problems clearly illustrate the issues with solving the HJB equation.

    Domain ΔxΔ𝑥\Delta xroman_Δ italic_x ΔtΔ𝑡\Delta troman_Δ italic_t T𝑇Titalic_T
    [2,2]22[-2,2][ - 2 , 2 ] 0.020.020.020.02 3.33×1063.33superscript1063.33\times 10^{-6}3.33 × 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT 1
    TABLE I: Parameters used in finite difference solution to HJB PDE.

    Nonlinear case

    We consider the nonlinear system dx=(cos(x)+u)dt+ϵdw𝑑𝑥𝑐𝑜𝑠𝑥𝑢𝑑𝑡italic-ϵ𝑑𝑤dx=(-cos(x)+u)dt+\epsilon dwitalic_d italic_x = ( - italic_c italic_o italic_s ( italic_x ) + italic_u ) italic_d italic_t + italic_ϵ italic_d italic_w with initial condition x(0)=1𝑥01x(0)=1italic_x ( 0 ) = 1. As discussed in Sec. IV, MPC-SH feedback law is the optimal feedback law for the deterministic problem and the cost is O(ϵ4)𝑂superscriptitalic-ϵ4O(\epsilon^{4})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) near-optimal to the stochastic cost. The algorithm for MPC-SH is given in Algorithm 1. To solve the open-loop optimization problem in MPC-SH, the iterative linear quadratic regulator (ILQR) algorithm is used [ILQG_tassa2012synthesis]. ILQR is used specifically since the converged optimal solution satisfies the necessary conditions of the minimum principle given in Eqs. (18), (19). As discussed in Proposition 4, the deterministic open-loop problem has a unique minimum for our case, and ILQR will guarantee convergence to it [wang2022search].

    In our experiment, we solve the HJB equation in (39) for a particular value of ϵitalic-ϵ\epsilonitalic_ϵ in the domain [2,2]22[-2,2][ - 2 , 2 ]. We use the obtained feedback policy u=1rJx𝑢1𝑟𝐽𝑥u=-\frac{1}{r}\frac{\partial J}{\partial x}italic_u = - divide start_ARG 1 end_ARG start_ARG italic_r end_ARG divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_x end_ARG, and apply it to the nonlinear system using the same ϵitalic-ϵ\epsilonitalic_ϵ value the HJB was solved for, to regulate the noise acting on the system. The system is simulated under this feedback policy for a time interval of [0,1]01[0,1][ 0 , 1 ]. We do Monte Carlo simulations of the system for different noise samples of w𝑤witalic_w and obtain the mean and standard deviation of the cost incurred by the system over these experiments. The open-loop optimization in MPC-SH is solved using ILQR as discussed above for the specific initial condition and tested on the stochastic nonlinear system for a value of ϵitalic-ϵ\epsilonitalic_ϵ. The experiment is repeated for different noise levels by varying ϵitalic-ϵ\epsilonitalic_ϵ. The decision epoch time chosen for MPC-SH was Δ=0.005Δ0.005\Delta=0.005roman_Δ = 0.005, approximately 1000×1000\times1000 × the ΔtΔ𝑡\Delta troman_Δ italic_t used in FD. The mean and standard deviation of the cost incurred in these experiments are tabulated in Fig. 1. Fig. 1 shows that the MPC-SH feedback law has comparable performance with the stochastic HJB-FD solution. MPC-SH is also computationally more efficient to solve, as HJB-FD requires very fine time discretization to solve without numerical issues even for the 1-D case owing to the CFL conditions (see table I). Also, MPC-SH finds an optimal trajectory for a single initial condition as opposed to HJB-FD which finds all solutions over the entire domain, which is computationally expensive.

    Refer to caption
    (a) Full noise spectrum.
    Refer to caption
    (b) Low noise region enhanced.
    Figure 1: Performance comparison of HJB-FD and MPC-SH on the 1-D nonlinear system for different noise levels.
    Refer to caption
    (a) HJB-FD cost-to-go.
    Refer to caption
    (b) MPC-SH ϵ=0.8italic-ϵ0.8\epsilon=0.8italic_ϵ = 0.8
    Refer to caption
    (c) HJB-FD ϵ=0.2italic-ϵ0.2\epsilon=0.2italic_ϵ = 0.2
    Refer to caption
    (d) HJB-FD ϵ=0.8italic-ϵ0.8\epsilon=0.8italic_ϵ = 0.8
    Figure 2: (a) Comparison of the expected cost-to-go obtained from the HJB-FD solution and the actual cost incurred by applying the HJB-FD feedback policy on the nonlinear system. The cost-to-go is obtained for the initial condition x(0)=1𝑥01x(0)=1italic_x ( 0 ) = 1 and the actual cost is the average cost of 500 simulations. Trajectory samples of the nonlinear system under the MPC-SH policy are shown in (b), and under HJB-FD policy for two different cases of ϵitalic-ϵ\epsilonitalic_ϵ are shown in (c) and (d).

    Even when the deterministic solution, which MPC-SH is, is applied to the stochastic case, the performance is almost equivalent, due to the O(ϵ4)𝑂superscriptitalic-ϵ4O(\epsilon^{4})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) near-optimality of the deterministic solution to the stochastic. Moreover, the stochastic policy has higher variance than the deterministic MPC-SH policy at ϵ=0.8italic-ϵ0.8\epsilon=0.8italic_ϵ = 0.8, and fails after that - another case that shows that the calculated stochastic policy is inaccurate. To illustrate the inaccuracy in the HJB-FD solution, we compare the expected cost-to-go value calculated by solving the HJB with the true cost of operation in Fig. 2a. It can be seen that the cost-to-go becomes inaccurate after ϵ=0.6italic-ϵ0.6\epsilon=0.6italic_ϵ = 0.6. The reason for the inaccuracy of the stochastic HJB-FD solution is illustrated in Fig. 2c and 2d. The plots show the trajectories taken by the system under the HJB-FD feedback policy for different values of ϵitalic-ϵ\epsilonitalic_ϵ. When the ϵitalic-ϵ\epsilonitalic_ϵ value parametrizing the strength of the noise becomes large, it can be seen that the trajectories leave the domain on which the solution is obtained, due to the noise acting on the system. Since the cost-to-go solution is unavailable outside the domain, one has to approximate the cost of these trajectories with the cost at the boundary. To get an accurate solution, the domain one has to solve needs to expand with time. Since most computational methods do a fixed domain approximation, the stochastic solution obtained will inherently be inaccurate because the states inside the boundary need the cost-to-go values of states outside the domain as the noise intensity increases. Expanding the domain makes the problem more computationally expensive and trajectories will still leave the domain in high noise cases. In contrast, MPC-SH does not face the issue of computational inaccuracy when a trajectory exits the boundary since it can compute a new trajectory from any given state without worrying about the boundary and the boundary conditions as required by HJB-FD. In particular, this may be construed as the primary computational benefit of using the MPC-SH approach.  In the deterministic case, owing to the absence of noise, the control takes the system towards the origin and not outside the domain. So, the deterministic cost-to-go and feedback policy is always accurate. Furthermore, note that when the stochastic HJB solution is accurate, the system does not leave the domain owing to the control dominating the effect of the noise.

    V-B Comparison between MPC-SH and Optimal Linear feedback

    In this section, we will show the comparison in performance of two different deterministic feedback laws: the optimal linear feedback and the MPC-SH feedback law. In Remark 3, it was shown that the optimal linear feedback controller given by Eqs. (27)-(29), designed around the optimal open-loop nominal trajectory is also near-optimal to the order of O(ϵ4)𝑂superscriptitalic-ϵ4O(\epsilon^{4})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) to the stochastic system. This design is referred to as the trajectory-optimized perturbation feedback controller (T-PFC) [parunandi2019TPFC]. The difference between T-PFC and MPC-SH is that, T-PFC plans the nominal trajectory only once, from the initial state, and uses the linear feedback to correct for errors during its execution. While, MPC-SH replans the nominal trajectory from the current state continuously and uses the linear feedback only for a short interval ΔΔ\Deltaroman_Δ between the replans. The advantage of using T-PFC is that the open-loop optimization has to be carried out only once (preferably offline), and the precomputed linear feedback gains can be used to correct for deviations due to uncertainty online. In a stochastic setting, this optimal nominal trajectory generated initially is only optimal if the system stays close to the nominal. If it deviates, the trajectory has to be replanned from the current state as done by MPC-SH to maintain optimal performance. We will examine how the performance of T-PFC compares with MPC-SH in nonlinear robotics problems, namely the car-like robot and cart-pole system, for different noise levels in Fig. 3.

    In Fig. 3, we see that T-PFC shows comparable performance to MPC-SH for low values of ϵitalic-ϵ\epsilonitalic_ϵ. As noise increases, the trajectory deviates from the nominal computed initially, and the feedback policy is no longer optimal, necessitating the need for a replanned nominal trajectory from the current state. Hence, the performance of T-PFC deteriorates for high noise levels. Nevertheless, there is value for T-PFC-like deterministic feedback laws in applications that wish to minimize onboard computing and act in low-noise settings.

    Refer to caption
    (a) Car-like robot
    Refer to caption
    (b) Cartpole
    Figure 3: Performance comparison of T-PFC with MPC-SH in nonlinear robotics systems. Both policies are computed for a specific initial condition and tested on 500 different samples for each value of ϵitalic-ϵ\epsilonitalic_ϵ to find the cost statistics. The car-like robot considered is a 4-D system and is governed by the equations x˙=vcosθ˙𝑥𝑣𝑐𝑜𝑠𝜃\dot{x}=vcos\thetaover˙ start_ARG italic_x end_ARG = italic_v italic_c italic_o italic_s italic_θ, y˙=vsinθ˙𝑦𝑣𝑠𝑖𝑛𝜃\dot{y}=vsin\thetaover˙ start_ARG italic_y end_ARG = italic_v italic_s italic_i italic_n italic_θ, θ˙=vLtanϕ˙𝜃𝑣𝐿𝑡𝑎𝑛italic-ϕ\dot{\theta}=\frac{v}{L}tan\phiover˙ start_ARG italic_θ end_ARG = divide start_ARG italic_v end_ARG start_ARG italic_L end_ARG italic_t italic_a italic_n italic_ϕ, ϕ˙=ω˙italic-ϕ𝜔\dot{\phi}=\omegaover˙ start_ARG italic_ϕ end_ARG = italic_ω, where v,ω𝑣𝜔v,\omegaitalic_v , italic_ω are the control inputs and L𝐿Litalic_L is the length of the car. The cart-pole is also a 4-D system and is governed by (M+m)x¨mLθ˙2sinθ+mLθ¨cosθ=F𝑀𝑚¨𝑥𝑚𝐿superscript˙𝜃2𝑠𝑖𝑛𝜃𝑚𝐿¨𝜃𝑐𝑜𝑠𝜃𝐹(M+m)\ddot{x}-mL\dot{\theta}^{2}sin\theta+mL\ddot{\theta}cos\theta=F( italic_M + italic_m ) over¨ start_ARG italic_x end_ARG - italic_m italic_L over˙ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s italic_i italic_n italic_θ + italic_m italic_L over¨ start_ARG italic_θ end_ARG italic_c italic_o italic_s italic_θ = italic_F, mL2θ¨+mLx¨cosθ+mgLsinθ=0𝑚superscript𝐿2¨𝜃𝑚𝐿¨𝑥𝑐𝑜𝑠𝜃𝑚𝑔𝐿𝑠𝑖𝑛𝜃0mL^{2}\ddot{\theta}+mL\ddot{x}cos\theta+mgLsin\theta=0italic_m italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over¨ start_ARG italic_θ end_ARG + italic_m italic_L over¨ start_ARG italic_x end_ARG italic_c italic_o italic_s italic_θ + italic_m italic_g italic_L italic_s italic_i italic_n italic_θ = 0, where F𝐹Fitalic_F is the control input, and M,m,L𝑀𝑚𝐿M,m,Litalic_M , italic_m , italic_L are the mass of the cart, mass of the pole and length of the pole. Process noise was added to the above systems after propagating the dynamics at every time step. The standard deviation of the noise added was the maximum value of the states in the optimal nominal trajectory. 

    V-C Discussion

    The primary takeaway from Section V-A and V-B is that deterministic policies are not only near-optimal but also accurate, scalable, and repeatable. It is not possible to compute the stochastic policy accurately, as shown in Sec. V-A. Note that the inaccuracy is not a limitation of the finite difference method used. Galerkin Finite Element and Collocation methods like Chebyshev polynomial-based methods are also solved on a bounded domain, and consequently, not immune to the errors observed in FD. As discussed in Sec. IV, random sampling-based methods like approximate dynamic programming, and reinforcement learning are dependent on their samples to explore the domain and inherently have the same issue in the stochastic case. In high dimensions problems, one needs a prohibitively large number of samples to explore the domain. An inefficient sampling of the domain will lead to inaccurate policies as the cost-to-go is not accurately captured by the samples. Due to this issue, there is an inherent variance in the solution obtained by such methods [RL_conv]. We have done an exhaustive investigation comparing the deterministic feedback approach with other RL methods in the companion paper [wang2022search], where we report the accuracy, scalability, efficiency, and repeatability of the deterministic policy that the stochastic RL methods lack. To summarize, as shown in Fig. 2, the regime where the stochastic solutions can be computed accurately is the one of low noise where the deterministic solution gives near-identical performance, and consequently, in practice, the deterministic feedback is sufficient.

    VI Conclusion

    In this paper, we have considered the problem of stochastic nonlinear control. We have shown that recursively solving the deterministic optimal control problem from the current state, à la MPC, results in a near-optimum policy to fourth order in a small noise parameter, and in practice, empirical evidence shows that the MPC law performs better than the law obtained by computationally solving the stochastic DP problem owing to the perturbation structure of the deterministic optimal control problem. An important limitation currently is the smoothness of the nominal trajectory such that suitable Taylor expansions are possible, this breaks down when trajectories are non-smooth such as in hybrid systems like legged robots, or maneuvers have kinks for car-like robots such as in a tight parking application. It remains to be seen as to if, and how, one may extend the result to such applications that are piecewise smooth in the dynamics. Also, a further careful investigation into the relative merits and demerits of the shrinking horizon approach to MPC when compared to the traditional fixed horizon approach is required, as is the generalization to the more practical and important partially observed problem.

    Appendix A DETAILED PROOFS OF RESULTS

    A-A Proof of Lemma 1

    Proof:

    We proceed by induction. The first general instance of the recursion occurs at k=3𝑘3k=3italic_k = 3. It can be shown that:

    δx3𝛿subscript𝑥3\displaystyle\delta x_{3}italic_δ italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT =(A¯2A¯1(ϵw0Δt)+A¯2(ϵw1Δt)+ϵw2Δt)δx3l+e3,absentsubscriptsubscript¯𝐴2subscript¯𝐴1italic-ϵsubscript𝑤0Δ𝑡subscript¯𝐴2italic-ϵsubscript𝑤1Δ𝑡italic-ϵsubscript𝑤2Δ𝑡𝛿superscriptsubscript𝑥3𝑙subscript𝑒3\displaystyle=\underbrace{(\bar{A}_{2}\bar{A}_{1}(\epsilon w_{0}\sqrt{\Delta t% })+\bar{A}_{2}(\epsilon w_{1}\sqrt{\Delta t})+\epsilon w_{2}\sqrt{\Delta t})}_% {\delta x_{3}^{l}}+e_{3},= under⏟ start_ARG ( over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG ) + over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϵ italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG ) + italic_ϵ italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG ) end_ARG start_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + italic_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ,
    e3subscript𝑒3\displaystyle e_{3}italic_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT =A¯2S¯1(ϵw0Δt)+S¯2(A¯1(ϵw0Δt)+ϵw1Δt+\displaystyle=\bar{A}_{2}\bar{S}_{1}(\epsilon w_{0}\sqrt{\Delta t})+\bar{S}_{2% }(\bar{A}_{1}(\epsilon w_{0}\sqrt{\Delta t})+\epsilon w_{1}\sqrt{\Delta t}+= over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG ) + over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG ) + italic_ϵ italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG +
    S¯1(ϵw0Δt)).\displaystyle\bar{S}_{1}(\epsilon w_{0}\sqrt{\Delta t})).over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG ) ) .

    Noting that S¯1(.)\bar{S}_{1}(.)over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( . ) and S¯2(.)\bar{S}_{2}(.)over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( . ) are second and higher order terms, it follows that e3subscript𝑒3e_{3}italic_e start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is O(ϵ2)𝑂superscriptitalic-ϵ2O(\epsilon^{2})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).
    Suppose now that δxk=δxkl+ek𝛿subscript𝑥𝑘𝛿superscriptsubscript𝑥𝑘𝑙subscript𝑒𝑘\delta x_{k}=\delta x_{k}^{l}+e_{k}italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT where eksubscript𝑒𝑘e_{k}italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is O(ϵ2)𝑂superscriptitalic-ϵ2O(\epsilon^{2})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Then: δxk+1=A¯k(δxkl+ek)+ϵwkΔt+S¯k(δxk),=(A¯kδxkl+ϵwkΔt)δxk+1l+{A¯kek+S¯k(δxk)}ek+1.\delta x_{k+1}=\bar{A}_{k}(\delta x_{k}^{l}+e_{k})+\epsilon w_{k}\sqrt{\Delta t% }+\bar{S}_{k}(\delta x_{k}),\\ =\underbrace{(\bar{A}_{k}\delta x_{k}^{l}+\epsilon w_{k}\sqrt{\Delta t})}_{% \delta x_{k+1}^{l}}+\underbrace{\{\bar{A}_{k}e_{k}+\bar{S}_{k}(\delta x_{k})\}% }_{e_{k+1}}.italic_δ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_ϵ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG + over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , = under⏟ start_ARG ( over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + italic_ϵ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG ) end_ARG start_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + under⏟ start_ARG { over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) } end_ARG start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT . Noting that S¯ksubscript¯𝑆𝑘\bar{S}_{k}over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is O(ϵ2)𝑂superscriptitalic-ϵ2O(\epsilon^{2})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) and that eksubscript𝑒𝑘e_{k}italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is O(ϵ2)𝑂superscriptitalic-ϵ2O(\epsilon^{2})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) by assumption, the result follows that ek+1subscript𝑒𝑘1e_{k+1}italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT is O(ϵ2)𝑂superscriptitalic-ϵ2O(\epsilon^{2})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).
    Now, let us take a closer look at the term eksubscript𝑒𝑘e_{k}italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and again proceed by induction. It is clear that e1=e1(2)=0subscript𝑒1superscriptsubscript𝑒120e_{1}=e_{1}^{(2)}=0italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT = 0. Next, it can be seen that e2=A¯1e1(2)+δx1l𝖳S¯1(2)δx1l+O(ϵ3)=(ϵ2Δt)w0𝖳S¯1(2)w0+O(ϵ3)subscript𝑒2subscript¯𝐴1superscriptsubscript𝑒12𝛿superscriptsuperscriptsubscript𝑥1𝑙𝖳superscriptsubscript¯𝑆12𝛿superscriptsubscript𝑥1𝑙𝑂superscriptitalic-ϵ3superscriptitalic-ϵ2Δ𝑡superscriptsubscript𝑤0𝖳superscriptsubscript¯𝑆12subscript𝑤0𝑂superscriptitalic-ϵ3e_{2}=\bar{A}_{1}e_{1}^{(2)}+{\delta x_{1}^{l}}^{\mathsf{T}}\bar{S}_{1}^{(2)}% \delta x_{1}^{l}+O(\epsilon^{3})=(\epsilon^{2}\Delta t){w_{0}}^{\mathsf{T}}% \bar{S}_{1}^{(2)}w_{0}+O(\epsilon^{3})italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT + italic_δ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) = ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Δ italic_t ) italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ), which shows the recursion is valid for k=2𝑘2k=2italic_k = 2 given it is so for k=1𝑘1k=1italic_k = 1.
    Suppose that it is true for k𝑘kitalic_k. Then: δxk+1=A¯kδxk+S¯k(δxk)+ϵwkΔt=A¯k(δxkl+ek)+S¯k(δxkl+ek)+ϵwkΔt=(A¯kδxkl+ϵwkΔt)δxk+1l+A¯kek(2)+δxkl𝖳S¯k(2)δxklek+1(2)+O(ϵ3),𝛿subscript𝑥𝑘1subscript¯𝐴𝑘𝛿subscript𝑥𝑘subscript¯𝑆𝑘𝛿subscript𝑥𝑘italic-ϵsubscript𝑤𝑘Δ𝑡subscript¯𝐴𝑘𝛿superscriptsubscript𝑥𝑘𝑙subscript𝑒𝑘subscript¯𝑆𝑘𝛿superscriptsubscript𝑥𝑘𝑙subscript𝑒𝑘italic-ϵsubscript𝑤𝑘Δ𝑡subscriptsubscript¯𝐴𝑘𝛿superscriptsubscript𝑥𝑘𝑙italic-ϵsubscript𝑤𝑘Δ𝑡𝛿superscriptsubscript𝑥𝑘1𝑙subscriptsubscript¯𝐴𝑘superscriptsubscript𝑒𝑘2𝛿superscriptsuperscriptsubscript𝑥𝑘𝑙𝖳superscriptsubscript¯𝑆𝑘2𝛿superscriptsubscript𝑥𝑘𝑙superscriptsubscript𝑒𝑘12𝑂superscriptitalic-ϵ3\delta x_{k+1}=\bar{A}_{k}\delta x_{k}+\bar{S}_{k}(\delta x_{k})+\epsilon w_{k% }\sqrt{\Delta t}=\bar{A}_{k}(\delta x_{k}^{l}+e_{k})+\bar{S}_{k}(\delta x_{k}^% {l}+e_{k})+\epsilon w_{k}\sqrt{\Delta t}=\underbrace{(\bar{A}_{k}\delta x_{k}^% {l}+\epsilon w_{k}\sqrt{\Delta t})}_{\delta x_{k+1}^{l}}+\underbrace{\bar{A}_{% k}e_{k}^{(2)}+{\delta x_{k}^{l}}^{\mathsf{T}}\bar{S}_{k}^{(2)}\delta x_{k}^{l}% }_{e_{k+1}^{(2)}}+O(\epsilon^{3}),italic_δ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_ϵ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG = over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_ϵ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG = under⏟ start_ARG ( over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + italic_ϵ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG ) end_ARG start_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + under⏟ start_ARG over¯ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT + italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) ,where the last line follows because ek=ek(2)+O(ϵ3)subscript𝑒𝑘superscriptsubscript𝑒𝑘2𝑂superscriptitalic-ϵ3e_{k}=e_{k}^{(2)}+O(\epsilon^{3})italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ), and S¯k(2)superscriptsubscript¯𝑆𝑘2\bar{S}_{k}^{(2)}over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT is the second order term of S¯k(.)\bar{S}_{k}(.)over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( . ). This completes the induction and the proof. \hfill\blacksquare

    A-B Proof of Lemma 2

    Proof:

    We have that: 𝒥π=kc¯k+kC¯k(δxkl+ek)+kH¯k(δxkl+ek),=kc¯k+kC¯kδxkl+kδxklH¯k(2)δxkl+C¯kek(2)+O(ϵ3)\mathcal{J}^{\pi}=\sum_{k}\bar{c}_{k}+\sum_{k}\bar{C}_{k}(\delta x_{k}^{l}+e_{% k})+\sum_{k}\bar{H}_{k}(\delta x_{k}^{l}+e_{k}),=\sum_{k}\bar{c}_{k}+\sum_{k}% \bar{C}_{k}\delta x_{k}^{l}+\sum_{k}\delta x_{k}^{l^{\prime}}\bar{H}_{k}^{(2)}% \delta x_{k}^{l}+\bar{C}_{k}e_{k}^{(2)}+O(\epsilon^{3})caligraphic_J start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over¯ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over¯ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over¯ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , = ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over¯ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over¯ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT over¯ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT + over¯ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT + italic_O ( italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ), where the last line of the equation above follows from an application of Lemma 1. \hfill\blacksquare

    A-C Proof of Proposition 1

    In order to prove this result, we first need the following preparatory result. Consider the following deterministic continuous time system:

    Jπ(0,x0)superscript𝐽𝜋0subscript𝑥0\displaystyle J^{\pi}(0,x_{0})italic_J start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( 0 , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =0Tc(xt,πt(xt)c¯(t,xt)𝑑t+cT(xT),\displaystyle=\int_{0}^{T}\underbrace{c(x_{t},\pi_{t}(x_{t})}_{\bar{c}(t,x_{t}% )}dt+c_{T}(x_{T}),= ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT under⏟ start_ARG italic_c ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT over¯ start_ARG italic_c end_ARG ( italic_t , italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_d italic_t + italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ,
    x˙˙𝑥\displaystyle\dot{x}over˙ start_ARG italic_x end_ARG =f(x)+g(x)πt(x)f¯(t,x)+ϵv,absentsubscript𝑓𝑥𝑔𝑥subscript𝜋𝑡𝑥¯𝑓𝑡𝑥italic-ϵ𝑣\displaystyle=\underbrace{f(x)+g(x)\pi_{t}(x)}_{\bar{f}(t,x)}+\epsilon v,= under⏟ start_ARG italic_f ( italic_x ) + italic_g ( italic_x ) italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) end_ARG start_POSTSUBSCRIPT over¯ start_ARG italic_f end_ARG ( italic_t , italic_x ) end_POSTSUBSCRIPT + italic_ϵ italic_v ,

    where v(t)𝑣𝑡v(t)italic_v ( italic_t ) is a given continuous time input. We rewrite the above policy evaluation equation in state-space form as follows: x˙=f¯(t,x)+ϵv,R˙=c¯(t,x),t˙=1,Z(t)=R(t)+cT(x),formulae-sequence˙𝑥¯𝑓𝑡𝑥italic-ϵ𝑣formulae-sequence˙𝑅¯𝑐𝑡𝑥formulae-sequence˙𝑡1𝑍𝑡𝑅𝑡subscript𝑐𝑇𝑥\dot{x}=\bar{f}(t,x)+\epsilon v,\;\dot{R}=\bar{c}(t,x),\;\dot{t}=1,\\ Z(t)=R(t)+c_{T}(x),over˙ start_ARG italic_x end_ARG = over¯ start_ARG italic_f end_ARG ( italic_t , italic_x ) + italic_ϵ italic_v , over˙ start_ARG italic_R end_ARG = over¯ start_ARG italic_c end_ARG ( italic_t , italic_x ) , over˙ start_ARG italic_t end_ARG = 1 , italic_Z ( italic_t ) = italic_R ( italic_t ) + italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x ) , where the above equations can now be expressed in a time-invariant state space form as: X˙=F(X)+ϵGv˙𝑋𝐹𝑋italic-ϵ𝐺𝑣\dot{X}=F(X)+\epsilon Gvover˙ start_ARG italic_X end_ARG = italic_F ( italic_X ) + italic_ϵ italic_G italic_v, and Z(t)=H(X(t))𝑍𝑡𝐻𝑋𝑡Z(t)=H(X(t))italic_Z ( italic_t ) = italic_H ( italic_X ( italic_t ) ), where X=[x,R,t]𝑋superscript𝑥𝑅𝑡X=[x,R,t]^{\prime}italic_X = [ italic_x , italic_R , italic_t ] start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, F=[f¯(t,x),c¯(t,x),1]𝐹superscript¯𝑓𝑡𝑥¯𝑐𝑡𝑥1F=[\bar{f}(t,x),\bar{c}(t,x),1]^{\prime}italic_F = [ over¯ start_ARG italic_f end_ARG ( italic_t , italic_x ) , over¯ start_ARG italic_c end_ARG ( italic_t , italic_x ) , 1 ] start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, G=[In,0,0]𝐺superscriptsubscript𝐼𝑛00G=[I_{n},0,0]^{\prime}italic_G = [ italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , 0 , 0 ] start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and H(X)=R+cT(x)𝐻𝑋𝑅subscript𝑐𝑇𝑥H(X)=R+c_{T}(x)italic_H ( italic_X ) = italic_R + italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x ).
    Given that the component functions f(),g(),c(,),cT(),πt()𝑓𝑔𝑐subscript𝑐𝑇subscript𝜋𝑡f(\cdot),~{}g(\cdot),~{}c(\cdot,\cdot),~{}c_{T}(\cdot),~{}\pi_{t}(\cdot)italic_f ( ⋅ ) , italic_g ( ⋅ ) , italic_c ( ⋅ , ⋅ ) , italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( ⋅ ) , italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ) are five times continuously differentiable (𝒞5superscript𝒞5\mathcal{C}^{5}caligraphic_C start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT) in their arguments (assumption A2), the output Z(T)=Jπ(0,x0)𝑍𝑇superscript𝐽𝜋0subscript𝑥0Z(T)=J^{\pi}(0,x_{0})italic_Z ( italic_T ) = italic_J start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( 0 , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) can be expressed in terms of the inputs v(t)𝑣𝑡v(t)italic_v ( italic_t ) as the unique Volterra series (Theorem 2.5 in [Volterra_Krener]) where we have suppressed the dependence on π𝜋\piitalic_π for notational convenience:

    Z(T)=J(0)(x0)+ϵ0TJ(1)(T,s)v(s)𝑑s𝑍𝑇superscript𝐽0subscript𝑥0italic-ϵsuperscriptsubscript0𝑇superscript𝐽1𝑇𝑠𝑣𝑠differential-d𝑠\displaystyle Z(T)=J^{(0)}(x_{0})+\epsilon\int_{0}^{T}J^{(1)}(T,s)v(s)dsitalic_Z ( italic_T ) = italic_J start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_ϵ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ( italic_T , italic_s ) italic_v ( italic_s ) italic_d italic_s
    +ϵ20T0s1J(2)(T,s1,s2)v(s1)v(s2)𝑑s2𝑑s1superscriptitalic-ϵ2superscriptsubscript0𝑇superscriptsubscript0subscript𝑠1superscript𝐽2𝑇subscript𝑠1subscript𝑠2𝑣subscript𝑠1𝑣subscript𝑠2differential-dsubscript𝑠2differential-dsubscript𝑠1\displaystyle+\epsilon^{2}\int_{0}^{T}\int_{0}^{s_{1}}J^{(2)}(T,s_{1},s_{2})v(% s_{1})v(s_{2})ds_{2}ds_{1}+ italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ( italic_T , italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_v ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_v ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_d italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
    +ϵ30T0s10s2J(3)(T,s1,s2,s3)v(s3)v(s2)v(s1)𝑑s3𝑑s2𝑑s1superscriptitalic-ϵ3superscriptsubscript0𝑇superscriptsubscript0subscript𝑠1superscriptsubscript0subscript𝑠2superscript𝐽3𝑇subscript𝑠1subscript𝑠2subscript𝑠3𝑣subscript𝑠3𝑣subscript𝑠2𝑣subscript𝑠1differential-dsubscript𝑠3differential-dsubscript𝑠2differential-dsubscript𝑠1\displaystyle+\epsilon^{3}\int_{0}^{T}\int_{0}^{s_{1}}\int_{0}^{s_{2}}J^{(3)}(% T,s_{1},s_{2},s_{3})v(s_{3})v(s_{2})v(s_{1})ds_{3}ds_{2}ds_{1}+ italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT ( 3 ) end_POSTSUPERSCRIPT ( italic_T , italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) italic_v ( italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) italic_v ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_v ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_d italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_d italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
    +ϵ40T0s10s20s3J(4)(T,s1,s2,s3,s4)[v(s4)v(s3)\displaystyle+\epsilon^{4}\int_{0}^{T}\int_{0}^{s_{1}}\int_{0}^{s_{2}}\int_{0}% ^{s_{3}}J^{(4)}(T,s_{1},s_{2},s_{3},s_{4})[v(s_{4})v(s_{3})+ italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT ( 4 ) end_POSTSUPERSCRIPT ( italic_T , italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) [ italic_v ( italic_s start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) italic_v ( italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT )
    v(s2)v(s1)]ds3ds2ds1+𝒢,\displaystyle\quad\quad v(s_{2})v(s_{1})]ds_{3}ds_{2}ds_{1}+\mathcal{G},italic_v ( italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_v ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] italic_d italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_d italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + caligraphic_G , (40)

    where the Volterra kernels J(k)(.)J^{(k)}(.)italic_J start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( . ) are unique and continuous in their arguments, and 𝒢𝒢\mathcal{G}caligraphic_G is an o(ϵ4)𝑜superscriptitalic-ϵ4o(\epsilon^{4})italic_o ( italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) function.

    Proof:

    We show the result for a scalar input, the generalization to a vector input is straightforward. We first write the sample path cost in an input-output fashion in the discrete time case. Let v(t)𝑣𝑡v(t)italic_v ( italic_t ) be a given input sequence, and given a discretization time ΔtΔ𝑡\Delta troman_Δ italic_t such that N=T/Δt𝑁𝑇Δ𝑡N=T/\Delta titalic_N = italic_T / roman_Δ italic_t, let vk=v(kΔt)subscript𝑣𝑘𝑣𝑘Δ𝑡v_{k}=v(k\Delta t)italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_v ( italic_k roman_Δ italic_t ), k=0,1,2N1,𝑘012𝑁1k=0,1,2\cdots N-1,italic_k = 0 , 1 , 2 ⋯ italic_N - 1 , denote a piecewise constant approximation of the input. Under A2, the cost of any sample path from a given initial state x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT can be expanded as follows in discrete time (where we have suppressed the explicit dependence of the different terms on x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for simplifying notation): 𝒱Nπ=𝒱Nπ,0+ϵ𝒱Nπ,1+ϵ2𝒱Nπ,2+ϵ3𝒱Nπ,3+ϵ4𝒱Nπ,4+𝒢Nπ,subscriptsuperscript𝒱𝜋𝑁subscriptsuperscript𝒱𝜋0𝑁italic-ϵsubscriptsuperscript𝒱𝜋1𝑁superscriptitalic-ϵ2subscriptsuperscript𝒱𝜋2𝑁superscriptitalic-ϵ3subscriptsuperscript𝒱𝜋3𝑁superscriptitalic-ϵ4subscriptsuperscript𝒱𝜋4𝑁superscriptsubscript𝒢𝑁𝜋\mathcal{V}^{\pi}_{N}=\mathcal{V}^{\pi,0}_{N}+\epsilon\mathcal{V}^{\pi,1}_{N}+% \epsilon^{2}\mathcal{V}^{\pi,2}_{N}+\epsilon^{3}\mathcal{V}^{\pi,3}_{N}+% \epsilon^{4}\mathcal{V}^{\pi,4}_{N}+\mathcal{G}_{N}^{\pi},caligraphic_V start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = caligraphic_V start_POSTSUPERSCRIPT italic_π , 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT + italic_ϵ caligraphic_V start_POSTSUPERSCRIPT italic_π , 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_V start_POSTSUPERSCRIPT italic_π , 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT caligraphic_V start_POSTSUPERSCRIPT italic_π , 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT caligraphic_V start_POSTSUPERSCRIPT italic_π , 4 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT + caligraphic_G start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ,where 𝒱Nπ,0subscriptsuperscript𝒱𝜋0𝑁\mathcal{V}^{\pi,0}_{N}caligraphic_V start_POSTSUPERSCRIPT italic_π , 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT represents the nominal/ zero input cost and

    𝒱Nπ,1=s=0N1𝒥N(1)(NΔt,sΔt)vsΔt,subscriptsuperscript𝒱𝜋1𝑁superscriptsubscript𝑠0𝑁1subscriptsuperscript𝒥1𝑁𝑁Δ𝑡𝑠Δ𝑡subscript𝑣𝑠Δ𝑡\displaystyle\mathcal{V}^{\pi,1}_{N}=\sum_{s=0}^{N-1}\mathcal{J}^{(1)}_{N}(N% \Delta t,s\Delta t)v_{s}{\Delta t},caligraphic_V start_POSTSUPERSCRIPT italic_π , 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT caligraphic_J start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N roman_Δ italic_t , italic_s roman_Δ italic_t ) italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT roman_Δ italic_t ,
    𝒱Nπ,2=s1=0N1s2=0s1𝒥N(2)(NΔt,s1Δt,s2Δt)vs2vs1Δt2,subscriptsuperscript𝒱𝜋2𝑁superscriptsubscriptsubscript𝑠10𝑁1superscriptsubscriptsubscript𝑠20subscript𝑠1subscriptsuperscript𝒥2𝑁𝑁Δ𝑡subscript𝑠1Δ𝑡subscript𝑠2Δ𝑡subscript𝑣subscript𝑠2subscript𝑣subscript𝑠1Δsuperscript𝑡2\displaystyle\mathcal{V}^{\pi,2}_{N}=\sum_{s_{1}=0}^{N-1}\sum_{s_{2}=0}^{s_{1}% }\mathcal{J}^{(2)}_{N}(N\Delta t,s_{1}\Delta t,s_{2}\Delta t)v_{s_{2}}v_{s_{1}% }\Delta t^{2},caligraphic_V start_POSTSUPERSCRIPT italic_π , 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT caligraphic_J start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Δ italic_t ) italic_v start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Δ italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
    𝒱Nπ,3=s1=0N1s2=0s1s3=0s2𝒥N(3)(NΔt,s1Δt,s2Δt,s3Δt)subscriptsuperscript𝒱𝜋3𝑁superscriptsubscriptsubscript𝑠10𝑁1superscriptsubscriptsubscript𝑠20subscript𝑠1superscriptsubscriptsubscript𝑠30subscript𝑠2subscriptsuperscript𝒥3𝑁𝑁Δ𝑡subscript𝑠1Δ𝑡subscript𝑠2Δ𝑡subscript𝑠3Δ𝑡\displaystyle\mathcal{V}^{\pi,3}_{N}=\sum_{s_{1}=0}^{N-1}\sum_{s_{2}=0}^{s_{1}% }\sum_{s_{3}=0}^{s_{2}}\mathcal{J}^{(3)}_{N}(N\Delta t,s_{1}\Delta t,s_{2}% \Delta t,s_{3}\Delta t)caligraphic_V start_POSTSUPERSCRIPT italic_π , 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT caligraphic_J start_POSTSUPERSCRIPT ( 3 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT roman_Δ italic_t )
    ×vs3vs2vs1Δt3,absentsubscript𝑣subscript𝑠3subscript𝑣subscript𝑠2subscript𝑣subscript𝑠1Δsuperscript𝑡3\displaystyle\quad\quad\times v_{s_{3}}v_{s_{2}}v_{s_{1}}\Delta t^{3},× italic_v start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Δ italic_t start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ,
    𝒱Nπ,4=s1=0N1s2=0s1s3=0s2s4=0s3𝒥N(4)(NΔt,s1Δt,s2Δt,s3Δt,s4Δt)subscriptsuperscript𝒱𝜋4𝑁superscriptsubscriptsubscript𝑠10𝑁1superscriptsubscriptsubscript𝑠20subscript𝑠1superscriptsubscriptsubscript𝑠30subscript𝑠2superscriptsubscriptsubscript𝑠40subscript𝑠3subscriptsuperscript𝒥4𝑁𝑁Δ𝑡subscript𝑠1Δ𝑡subscript𝑠2Δ𝑡subscript𝑠3Δ𝑡subscript𝑠4Δ𝑡\displaystyle\mathcal{V}^{\pi,4}_{N}=\sum_{s_{1}=0}^{N-1}\sum_{s_{2}=0}^{s_{1}% }\sum_{s_{3}=0}^{s_{2}}\sum_{s_{4}=0}^{s_{3}}\mathcal{J}^{(4)}_{N}(N\Delta t,s% _{1}\Delta t,s_{2}\Delta t,s_{3}\Delta t,s_{4}\Delta t)caligraphic_V start_POSTSUPERSCRIPT italic_π , 4 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT caligraphic_J start_POSTSUPERSCRIPT ( 4 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT roman_Δ italic_t )
    ×vs4vs3vs2vs1Δt4,absentsubscript𝑣subscript𝑠4subscript𝑣subscript𝑠3subscript𝑣subscript𝑠2subscript𝑣subscript𝑠1Δsuperscript𝑡4\displaystyle\quad\quad\times v_{s_{4}}v_{s_{3}}v_{s_{2}}v_{s_{1}}\Delta t^{4},× italic_v start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Δ italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ,

    where 𝒥(k)()superscript𝒥𝑘\mathcal{J}^{(k)}(\cdot)caligraphic_J start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( ⋅ ) represent the piecewise constant discretized kernels corresponding to the Volterra kernels defined in (40). Further, the remainder function 𝒢Nπsuperscriptsubscript𝒢𝑁𝜋\mathcal{G}_{N}^{\pi}caligraphic_G start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT is an o(ϵ4)𝑜superscriptitalic-ϵ4o(\epsilon^{4})italic_o ( italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) function.
    Let Vπ(x0)superscript𝑉𝜋subscript𝑥0V^{\pi}(x_{0})italic_V start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) denote the cost of the trajectory under the continuous time input v(t)𝑣𝑡v(t)italic_v ( italic_t ). Then it follows that 𝒱Nπ(x0)Vπ(x0)subscriptsuperscript𝒱𝜋𝑁subscript𝑥0superscript𝑉𝜋subscript𝑥0\mathcal{V}^{\pi}_{N}(x_{0})\rightarrow V^{\pi}(x_{0})caligraphic_V start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) → italic_V start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) as N𝑁N\rightarrow\inftyitalic_N → ∞, regardless of the input sequence v(t)𝑣𝑡v(t)italic_v ( italic_t ). Therefore, it follows that the discretized piecewise constant kernels 𝒥N(k)J(k)subscriptsuperscript𝒥𝑘𝑁superscript𝐽𝑘\mathcal{J}^{(k)}_{N}\rightarrow J^{(k)}caligraphic_J start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT → italic_J start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT in the L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT sense as N𝑁N\rightarrow\inftyitalic_N → ∞.
    If the inputs were a discretized Wiener sequence ω(kΔt)=wkΔt𝜔𝑘Δ𝑡subscript𝑤𝑘Δ𝑡\omega(k\Delta t)=w_{k}\sqrt{\Delta t}italic_ω ( italic_k roman_Δ italic_t ) = italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG, where wksubscript𝑤𝑘w_{k}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a Gaussian white noise sequence, we can write the cost of a sample path as: 𝒥Nπ=𝒥Nπ,0+ϵ𝒥Nπ,1+ϵ2𝒥Nπ,2+ϵ3𝒥Nπ,3+ϵ4𝒥Nπ,4+Nπsubscriptsuperscript𝒥𝜋𝑁subscriptsuperscript𝒥𝜋0𝑁italic-ϵsubscriptsuperscript𝒥𝜋1𝑁superscriptitalic-ϵ2subscriptsuperscript𝒥𝜋2𝑁superscriptitalic-ϵ3subscriptsuperscript𝒥𝜋3𝑁superscriptitalic-ϵ4subscriptsuperscript𝒥𝜋4𝑁superscriptsubscript𝑁𝜋\mathcal{J}^{\pi}_{N}=\mathcal{J}^{\pi,0}_{N}+\epsilon\mathcal{J}^{\pi,1}_{N}+% \epsilon^{2}\mathcal{J}^{\pi,2}_{N}+\epsilon^{3}\mathcal{J}^{\pi,3}_{N}+% \epsilon^{4}\mathcal{J}^{\pi,4}_{N}+\mathcal{R}_{N}^{\pi}caligraphic_J start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = caligraphic_J start_POSTSUPERSCRIPT italic_π , 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT + italic_ϵ caligraphic_J start_POSTSUPERSCRIPT italic_π , 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_J start_POSTSUPERSCRIPT italic_π , 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT caligraphic_J start_POSTSUPERSCRIPT italic_π , 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT caligraphic_J start_POSTSUPERSCRIPT italic_π , 4 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT + caligraphic_R start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT, where 𝒥Nπ,0subscriptsuperscript𝒥𝜋0𝑁\mathcal{J}^{\pi,0}_{N}caligraphic_J start_POSTSUPERSCRIPT italic_π , 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT is the zero noise cost and

    𝒥Nπ,1=s=0N1𝒥N(1)(NΔt,sΔt)wsΔt,subscriptsuperscript𝒥𝜋1𝑁superscriptsubscript𝑠0𝑁1subscriptsuperscript𝒥1𝑁𝑁Δ𝑡𝑠Δ𝑡subscript𝑤𝑠Δ𝑡\displaystyle\mathcal{J}^{\pi,1}_{N}=\sum_{s=0}^{N-1}\mathcal{J}^{(1)}_{N}(N% \Delta t,s\Delta t)w_{s}{\sqrt{\Delta t}},caligraphic_J start_POSTSUPERSCRIPT italic_π , 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_s = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT caligraphic_J start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N roman_Δ italic_t , italic_s roman_Δ italic_t ) italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT square-root start_ARG roman_Δ italic_t end_ARG ,
    𝒥Nπ,2=s1=0N1s2=0s1𝒥N(2)(NΔt,s1Δt,s2Δt)ws2ws1Δt,subscriptsuperscript𝒥𝜋2𝑁superscriptsubscriptsubscript𝑠10𝑁1superscriptsubscriptsubscript𝑠20subscript𝑠1subscriptsuperscript𝒥2𝑁𝑁Δ𝑡subscript𝑠1Δ𝑡subscript𝑠2Δ𝑡subscript𝑤subscript𝑠2subscript𝑤subscript𝑠1Δ𝑡\displaystyle\mathcal{J}^{\pi,2}_{N}=\sum_{s_{1}=0}^{N-1}\sum_{s_{2}=0}^{s_{1}% }\mathcal{J}^{(2)}_{N}(N\Delta t,s_{1}\Delta t,s_{2}\Delta t)w_{s_{2}}w_{s_{1}% }\Delta t,caligraphic_J start_POSTSUPERSCRIPT italic_π , 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT caligraphic_J start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Δ italic_t ) italic_w start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Δ italic_t ,
    𝒥Nπ,3=s1=0N1s2=0s1s3=0s2(𝒥N(3)(NΔt,s1Δt,s2Δt,s3Δt)\displaystyle\mathcal{J}^{\pi,3}_{N}=\sum_{s_{1}=0}^{N-1}\sum_{s_{2}=0}^{s_{1}% }\sum_{s_{3}=0}^{s_{2}}\Big{(}\mathcal{J}^{(3)}_{N}(N\Delta t,s_{1}\Delta t,s_% {2}\Delta t,s_{3}\Delta t)caligraphic_J start_POSTSUPERSCRIPT italic_π , 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( caligraphic_J start_POSTSUPERSCRIPT ( 3 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT roman_Δ italic_t )
    ws3ws2ws1(Δt)3/2),\displaystyle\quad\quad w_{s_{3}}w_{s_{2}}w_{s_{1}}(\Delta t)^{3/2}\Big{)},italic_w start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( roman_Δ italic_t ) start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT ) ,
    𝒥Nπ,4=s1=0N1s2=0s1s3=0s2s4=0s3(𝒥N(4)(NΔt,s1Δt,s2Δt,s3Δt,\displaystyle\mathcal{J}^{\pi,4}_{N}=\sum_{s_{1}=0}^{N-1}\sum_{s_{2}=0}^{s_{1}% }\sum_{s_{3}=0}^{s_{2}}\sum_{s_{4}=0}^{s_{3}}\Big{(}\mathcal{J}^{(4)}_{N}(N% \Delta t,s_{1}\Delta t,s_{2}\Delta t,s_{3}\Delta t,caligraphic_J start_POSTSUPERSCRIPT italic_π , 4 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( caligraphic_J start_POSTSUPERSCRIPT ( 4 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_N roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Δ italic_t , italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT roman_Δ italic_t ,
    s4Δt)ws4ws3ws2ws1(Δt)2),\displaystyle\quad\quad s_{4}\Delta t)w_{s_{4}}w_{s_{3}}w_{s_{2}}w_{s_{1}}(% \Delta t)^{2}\Big{)},italic_s start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT roman_Δ italic_t ) italic_w start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( roman_Δ italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ,

    Moreover, due to the whiteness of the noise sequence {wk}subscript𝑤𝑘\{w_{k}\}{ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }, it follows that E[𝒥Nπ,1]=0𝐸delimited-[]subscriptsuperscript𝒥𝜋1𝑁0E[\mathcal{J}^{\pi,1}_{N}]=0italic_E [ caligraphic_J start_POSTSUPERSCRIPT italic_π , 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ] = 0, and E[𝒥Nπ,3]=0𝐸delimited-[]subscriptsuperscript𝒥𝜋3𝑁0E[\mathcal{J}^{\pi,3}_{N}]=0italic_E [ caligraphic_J start_POSTSUPERSCRIPT italic_π , 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ] = 0, since these terms are made of odd valued products of the noise sequences, while E[𝒥Nπ,2],E[𝒥Nπ,4]𝐸delimited-[]subscriptsuperscript𝒥𝜋2𝑁𝐸delimited-[]subscriptsuperscript𝒥𝜋4𝑁E[\mathcal{J}^{\pi,2}_{N}],E[\mathcal{J}^{\pi,4}_{N}]italic_E [ caligraphic_J start_POSTSUPERSCRIPT italic_π , 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ] , italic_E [ caligraphic_J start_POSTSUPERSCRIPT italic_π , 4 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ] are both finite owing to the finiteness of the moments of the noise values. Next as we take the limit of the terms above as N𝑁N\rightarrow\inftyitalic_N → ∞, we obtain:

    limNE[𝒥Nπ,2]=0TJ(2)(T,t,t)𝑑tJπ,1<,subscript𝑁𝐸delimited-[]subscriptsuperscript𝒥𝜋2𝑁superscriptsubscript0𝑇superscript𝐽2𝑇𝑡𝑡differential-d𝑡superscript𝐽𝜋1\displaystyle\lim_{N\rightarrow\infty}E[\mathcal{J}^{\pi,2}_{N}]=\int_{0}^{T}J% ^{(2)}(T,t,t)dt\equiv J^{\pi,1}<\infty,roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT italic_E [ caligraphic_J start_POSTSUPERSCRIPT italic_π , 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ] = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ( italic_T , italic_t , italic_t ) italic_d italic_t ≡ italic_J start_POSTSUPERSCRIPT italic_π , 1 end_POSTSUPERSCRIPT < ∞ ,
    limNE[𝒥Nπ,4]=0T0tJ(4)(T,t,t,τ,τ)𝑑τ𝑑tJπ,2<,subscript𝑁𝐸delimited-[]subscriptsuperscript𝒥𝜋4𝑁superscriptsubscript0𝑇superscriptsubscript0𝑡superscript𝐽4𝑇𝑡𝑡𝜏𝜏differential-d𝜏differential-d𝑡superscript𝐽𝜋2\displaystyle\lim_{N\rightarrow\infty}E[\mathcal{J}^{\pi,4}_{N}]=\int_{0}^{T}% \int_{0}^{t}J^{(4)}(T,t,t,\tau,\tau)d\tau dt\equiv J^{\pi,2}<\infty,roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT italic_E [ caligraphic_J start_POSTSUPERSCRIPT italic_π , 4 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ] = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT ( 4 ) end_POSTSUPERSCRIPT ( italic_T , italic_t , italic_t , italic_τ , italic_τ ) italic_d italic_τ italic_d italic_t ≡ italic_J start_POSTSUPERSCRIPT italic_π , 2 end_POSTSUPERSCRIPT < ∞ ,

    where the first equality above follows from the convergence of the discretized kernels 𝒥N(k)J(k)subscriptsuperscript𝒥𝑘𝑁superscript𝐽𝑘\mathcal{J}^{(k)}_{N}\rightarrow J^{(k)}caligraphic_J start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT → italic_J start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT for k=2,4𝑘24k=2,4italic_k = 2 , 4, while the integrals are finite owing to the continuity of the functions J(2)superscript𝐽2J^{(2)}italic_J start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT and J(4)superscript𝐽4J^{(4)}italic_J start_POSTSUPERSCRIPT ( 4 ) end_POSTSUPERSCRIPT as established in (40). Further limϵ0ϵ4limNE[Nπ]=limNE[limϵ0ϵ4Nπ]=0subscriptitalic-ϵ0superscriptitalic-ϵ4subscript𝑁𝐸delimited-[]superscriptsubscript𝑁𝜋subscript𝑁𝐸delimited-[]subscriptitalic-ϵ0superscriptitalic-ϵ4superscriptsubscript𝑁𝜋0\lim_{\epsilon\rightarrow 0}\epsilon^{-4}\lim_{N\rightarrow\infty}E[\mathcal{R% }_{N}^{\pi}]=\lim_{N\rightarrow\infty}E[\lim_{\epsilon\rightarrow 0}\epsilon^{% -4}\mathcal{R}_{N}^{\pi}]=0roman_lim start_POSTSUBSCRIPT italic_ϵ → 0 end_POSTSUBSCRIPT italic_ϵ start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT italic_E [ caligraphic_R start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ] = roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT italic_E [ roman_lim start_POSTSUBSCRIPT italic_ϵ → 0 end_POSTSUBSCRIPT italic_ϵ start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT caligraphic_R start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ] = 0, i.e., limNE[Nπ]subscript𝑁𝐸delimited-[]superscriptsubscript𝑁𝜋\lim_{N\rightarrow\infty}E[\mathcal{R}_{N}^{\pi}]roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT italic_E [ caligraphic_R start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ] is o(ϵ4)𝑜superscriptitalic-ϵ4o(\epsilon^{4})italic_o ( italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ). Therefore, taking expectations on both sides, we obtain: limNE[𝒥Nπ]=Jπ,0+ϵ2Jπ,1+ϵ4Jπ,2+o(ϵ4),subscript𝑁𝐸delimited-[]subscriptsuperscript𝒥𝜋𝑁superscript𝐽𝜋0superscriptitalic-ϵ2superscript𝐽𝜋1superscriptitalic-ϵ4superscript𝐽𝜋2𝑜superscriptitalic-ϵ4\lim_{N\rightarrow\infty}E[\mathcal{J}^{\pi}_{N}]={J}^{\pi,0}+\epsilon^{2}{J}^% {\pi,1}+\epsilon^{4}{J}^{\pi,2}+o(\epsilon^{4}),roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT italic_E [ caligraphic_J start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ] = italic_J start_POSTSUPERSCRIPT italic_π , 0 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT italic_π , 1 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT italic_π , 2 end_POSTSUPERSCRIPT + italic_o ( italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) ,where Jπ,0=limN𝒥Nπ,0superscript𝐽𝜋0subscript𝑁subscriptsuperscript𝒥𝜋0𝑁J^{\pi,0}=\lim_{N\rightarrow\infty}\mathcal{J}^{\pi,0}_{N}italic_J start_POSTSUPERSCRIPT italic_π , 0 end_POSTSUPERSCRIPT = roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT caligraphic_J start_POSTSUPERSCRIPT italic_π , 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, which proves the first part of the result.
    Next, from Lemma 2, as we take the limit Δt0Δ𝑡0\Delta t\rightarrow 0roman_Δ italic_t → 0, it is clear that Jπ,0superscript𝐽𝜋0{J}^{\pi,0}italic_J start_POSTSUPERSCRIPT italic_π , 0 end_POSTSUPERSCRIPT stems solely from the continuous-time nominal trajectory, and that Jπ,1superscript𝐽𝜋1{J}^{\pi,1}italic_J start_POSTSUPERSCRIPT italic_π , 1 end_POSTSUPERSCRIPT is dependent on the continuous-time nominal and the linear closed-loop feedback. Therefore, the result follows.\hfill\blacksquare

    A-D Proof of Proposition 2

    Proof:

    Using Proposition 1, we know that any cost function, and hence, the optimal cost-to-go function J(t,x)𝐽𝑡𝑥J(t,x)italic_J ( italic_t , italic_x ) can be expanded as:

    J=J0+ϵ2J1+ϵ4J2+.𝐽superscript𝐽0superscriptitalic-ϵ2superscript𝐽1superscriptitalic-ϵ4superscript𝐽2J=J^{0}+\epsilon^{2}J^{1}+\epsilon^{4}J^{2}+\cdots.italic_J = italic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯ . (41)

    Consider the HJB in Eq. (4) and substitute the minimizing control u=R𝟣𝒢(x)𝖳Jx𝑢superscript𝑅1𝒢superscript𝑥𝖳superscript𝐽𝑥u=-{R}^{\mathsf{-1}}{\mathcal{G}(x)}^{\mathsf{T}}J^{x}italic_u = - italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT caligraphic_G ( italic_x ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT (Eq. (5)). This gives the PDE

    Jt=𝐽𝑡absent\displaystyle-\frac{\partial J}{\partial t}=- divide start_ARG ∂ italic_J end_ARG start_ARG ∂ italic_t end_ARG = l¯+12(Jx)𝖳𝒢¯R𝟣𝒢¯𝖳Jx+(Jx)𝖳(¯𝒢¯R𝟣𝒢¯𝖳Jx)¯𝑙12superscriptsuperscript𝐽𝑥𝖳¯𝒢superscript𝑅1superscript¯𝒢𝖳superscript𝐽𝑥superscriptsuperscript𝐽𝑥𝖳¯¯𝒢superscript𝑅1superscript¯𝒢𝖳superscript𝐽𝑥\displaystyle\bar{l}+\frac{1}{2}{(J^{x})}^{\mathsf{T}}\bar{\mathcal{G}}{R}^{% \mathsf{-1}}{\bar{\mathcal{G}}}^{\mathsf{T}}J^{x}+{(J^{x})}^{\mathsf{T}}(\bar{% \mathcal{F}}-\bar{\mathcal{G}}{R}^{\mathsf{-1}}{\bar{\mathcal{G}}}^{\mathsf{T}% }J^{x})over¯ start_ARG italic_l end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_J start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + ( italic_J start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( over¯ start_ARG caligraphic_F end_ARG - over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT )
    +ϵ22tr(Jxx),superscriptitalic-ϵ22𝑡𝑟superscript𝐽𝑥𝑥\displaystyle+\frac{\epsilon^{2}}{2}tr(J^{xx}),+ divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_t italic_r ( italic_J start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT ) , (42)

    with terminal condition J(T,x)=cT(x).𝐽𝑇𝑥subscript𝑐𝑇𝑥J(T,x)=c_{T}(x).italic_J ( italic_T , italic_x ) = italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x ) . Also, l¯=l(x)¯𝑙𝑙𝑥\bar{l}=l(x)over¯ start_ARG italic_l end_ARG = italic_l ( italic_x ), ¯=(x)¯𝑥\bar{\mathcal{F}}=\mathcal{F}(x)over¯ start_ARG caligraphic_F end_ARG = caligraphic_F ( italic_x ), 𝒢¯=𝒢(x)¯𝒢𝒢𝑥\bar{\mathcal{G}}=\mathcal{G}(x)over¯ start_ARG caligraphic_G end_ARG = caligraphic_G ( italic_x ) and tr()𝑡𝑟tr()italic_t italic_r ( ) is the trace operator. Substituting Eq. (41) into Eq. (A-D) we obtain that:

    (J0tϵ2J1tϵ4J2t+)=l¯+superscript𝐽0𝑡superscriptitalic-ϵ2superscript𝐽1𝑡superscriptitalic-ϵ4superscript𝐽2𝑡limit-from¯𝑙\displaystyle(-\frac{\partial J^{0}}{\partial t}-\epsilon^{2}\frac{\partial J^% {1}}{\partial t}-\epsilon^{4}\frac{\partial J^{2}}{\partial t}+\cdots)=\bar{l}+( - divide start_ARG ∂ italic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_t end_ARG - italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG ∂ italic_J start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_t end_ARG - italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG ∂ italic_J start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_t end_ARG + ⋯ ) = over¯ start_ARG italic_l end_ARG +
    12(J0,x+ϵ2J1,x+)𝖳𝒢¯R𝟣𝒢¯𝖳(J0,x+ϵ2J1,x+)12superscriptsuperscript𝐽0𝑥superscriptitalic-ϵ2superscript𝐽1𝑥𝖳¯𝒢superscript𝑅1superscript¯𝒢𝖳superscript𝐽0𝑥superscriptitalic-ϵ2superscript𝐽1𝑥\displaystyle\frac{1}{2}{(J^{0,x}+\epsilon^{2}J^{1,x}+\cdots)}^{\mathsf{T}}% \bar{\mathcal{G}}{R}^{\mathsf{-1}}{\bar{\mathcal{G}}}^{\mathsf{T}}(J^{0,x}+% \epsilon^{2}J^{1,x}+\cdots)divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_J start_POSTSUPERSCRIPT 0 , italic_x end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT 1 , italic_x end_POSTSUPERSCRIPT + ⋯ ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( italic_J start_POSTSUPERSCRIPT 0 , italic_x end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT 1 , italic_x end_POSTSUPERSCRIPT + ⋯ )
    +(J0,x+ϵ2J1,x+)𝖳(¯𝒢¯R𝟣𝒢¯𝖳(J0,x+\displaystyle+{(J^{0,x}+\epsilon^{2}J^{1,x}+\cdots)}^{\mathsf{T}}\Big{(}\bar{% \mathcal{F}}-\bar{\mathcal{G}}{R}^{\mathsf{-1}}{\bar{\mathcal{G}}}^{\mathsf{T}% }(J^{0,x}++ ( italic_J start_POSTSUPERSCRIPT 0 , italic_x end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT 1 , italic_x end_POSTSUPERSCRIPT + ⋯ ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( over¯ start_ARG caligraphic_F end_ARG - over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( italic_J start_POSTSUPERSCRIPT 0 , italic_x end_POSTSUPERSCRIPT +
    ϵ2J1,x+))+ϵ22tr(J0,xx+ϵ2J1,xx+).\displaystyle\epsilon^{2}J^{1,x}+\cdots)\Big{)}+\frac{\epsilon^{2}}{2}tr(J^{0,% xx}+\epsilon^{2}J^{1,xx}+\cdots).italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT 1 , italic_x end_POSTSUPERSCRIPT + ⋯ ) ) + divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_t italic_r ( italic_J start_POSTSUPERSCRIPT 0 , italic_x italic_x end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT 1 , italic_x italic_x end_POSTSUPERSCRIPT + ⋯ ) . (43)

    Now, we equate the ϵ0superscriptitalic-ϵ0\epsilon^{0}italic_ϵ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, ϵ2superscriptitalic-ϵ2\epsilon^{2}italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT terms on both sides to obtain perturbation equations for the cost functions J0,J1,J2superscript𝐽0superscript𝐽1superscript𝐽2J^{0},J^{1},J^{2}\cdotsitalic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_J start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_J start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋯.
    First, let us consider the ϵ0superscriptitalic-ϵ0\epsilon^{0}italic_ϵ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT term. Utilizing Eq. (43) above, we obtain:

    J0tsuperscript𝐽0𝑡\displaystyle-\frac{\partial J^{0}}{\partial t}- divide start_ARG ∂ italic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_t end_ARG =l¯+12(J0,x)𝖳𝒢¯R𝟣𝒢¯𝖳(J0,x)absent¯𝑙12superscriptsuperscript𝐽0𝑥𝖳¯𝒢superscript𝑅1superscript¯𝒢𝖳superscript𝐽0𝑥\displaystyle=\bar{l}+\frac{1}{2}{(J^{0,x})}^{\mathsf{T}}\bar{\mathcal{G}}{R}^% {\mathsf{-1}}{\bar{\mathcal{G}}}^{\mathsf{T}}(J^{0,x})= over¯ start_ARG italic_l end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_J start_POSTSUPERSCRIPT 0 , italic_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( italic_J start_POSTSUPERSCRIPT 0 , italic_x end_POSTSUPERSCRIPT )
    +(J0,x)𝖳(¯𝒢¯R𝟣𝒢¯𝖳J0,x)f¯0,superscriptsuperscript𝐽0𝑥𝖳subscript¯¯𝒢superscript𝑅1superscript¯𝒢𝖳superscript𝐽0𝑥superscript¯𝑓0\displaystyle+{(J^{0,x})}^{\mathsf{T}}\underbrace{(\bar{\mathcal{F}}-\bar{% \mathcal{G}}{R}^{\mathsf{-1}}{\bar{\mathcal{G}}}^{\mathsf{T}}J^{0,x})}_{\bar{f% }^{0}},+ ( italic_J start_POSTSUPERSCRIPT 0 , italic_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT under⏟ start_ARG ( over¯ start_ARG caligraphic_F end_ARG - over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT 0 , italic_x end_POSTSUPERSCRIPT ) end_ARG start_POSTSUBSCRIPT over¯ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , (44)

    with the terminal condition J0(T,x)=cT(x)superscript𝐽0𝑇𝑥subscript𝑐𝑇𝑥J^{0}(T,x)=c_{T}(x)italic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_T , italic_x ) = italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x ).
    Similarly, one can obtain the J1superscript𝐽1J^{1}italic_J start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT equations by equating the O(ϵ2)𝑂superscriptitalic-ϵ2O(\epsilon^{2})italic_O ( italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) terms in Eq. (43), which after regrou** and cancelling some of the terms yields:

    J1t=(J1,x)𝖳(¯𝒢¯R𝟣𝒢¯𝖳J0,x)=f¯0+12tr(J0,xx),superscript𝐽1𝑡superscriptsuperscript𝐽1𝑥𝖳subscript¯¯𝒢superscript𝑅1superscript¯𝒢𝖳superscript𝐽0𝑥absentsuperscript¯𝑓012𝑡𝑟superscript𝐽0𝑥𝑥\displaystyle-\frac{\partial J^{1}}{\partial t}={(J^{1,x})}^{\mathsf{T}}% \underbrace{(\bar{\mathcal{F}}-\bar{\mathcal{G}}{R}^{\mathsf{-1}}{\bar{% \mathcal{G}}}^{\mathsf{T}}J^{0,x})}_{=\bar{f}^{0}}+\frac{1}{2}tr(J^{0,xx}),- divide start_ARG ∂ italic_J start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_t end_ARG = ( italic_J start_POSTSUPERSCRIPT 1 , italic_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT under⏟ start_ARG ( over¯ start_ARG caligraphic_F end_ARG - over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT 0 , italic_x end_POSTSUPERSCRIPT ) end_ARG start_POSTSUBSCRIPT = over¯ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_t italic_r ( italic_J start_POSTSUPERSCRIPT 0 , italic_x italic_x end_POSTSUPERSCRIPT ) , (45)

    with terminal boundary condition J1(T,x)=0superscript𝐽1𝑇𝑥0J^{1}(T,x)=0italic_J start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_T , italic_x ) = 0. Note the perturbation structure of Eqs. (A-D) and (45), J0(t,x)superscript𝐽0𝑡𝑥J^{0}(t,x)italic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_t , italic_x ) can be solved without knowledge of J1(t,x),J2(t,x)superscript𝐽1𝑡𝑥superscript𝐽2𝑡𝑥J^{1}(t,x),J^{2}(t,x)italic_J start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_t , italic_x ) , italic_J start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t , italic_x ) etc., while J1(t,x)superscript𝐽1𝑡𝑥J^{1}(t,x)italic_J start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_t , italic_x ) requires knowledge only of J0(t,x)superscript𝐽0𝑡𝑥J^{0}(t,x)italic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_t , italic_x ), and so on. In other words, the equations can be solved sequentially rather than simultaneously.

    Now, let us consider the deterministic HJB equation in Eq. (6). Recall, ϕ(t,x)italic-ϕ𝑡𝑥\phi(t,x)italic_ϕ ( italic_t , italic_x ) represents the optimal cost-to-go of the deterministic problem, and ud=R𝟣𝒢¯𝖳ϕxsuperscript𝑢𝑑superscript𝑅1superscript¯𝒢𝖳superscriptitalic-ϕ𝑥u^{d}=-{R}^{\mathsf{-1}}{\bar{\mathcal{G}}}^{\mathsf{T}}\phi^{x}italic_u start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT = - italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT is the deterministic policy, analogous to the stochastic case. Substituting udsuperscript𝑢𝑑u^{d}italic_u start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT in Eq. (6) gives

    ϕt=l¯+12(ϕx)𝖳𝒢¯R𝟣𝒢¯𝖳ϕx+(ϕx)𝖳(¯𝒢¯R𝟣𝒢¯𝖳ϕx),italic-ϕ𝑡¯𝑙12superscriptsuperscriptitalic-ϕ𝑥𝖳¯𝒢superscript𝑅1superscript¯𝒢𝖳superscriptitalic-ϕ𝑥superscriptsuperscriptitalic-ϕ𝑥𝖳¯¯𝒢superscript𝑅1superscript¯𝒢𝖳superscriptitalic-ϕ𝑥-\frac{\partial\phi}{\partial t}=\bar{l}+\frac{1}{2}{(\phi^{x})}^{\mathsf{T}}% \bar{\mathcal{G}}{R}^{\mathsf{-1}}{\bar{\mathcal{G}}}^{\mathsf{T}}\phi^{x}+{(% \phi^{x})}^{\mathsf{T}}(\bar{\mathcal{F}}-\bar{\mathcal{G}}{R}^{\mathsf{-1}}{% \bar{\mathcal{G}}}^{\mathsf{T}}\phi^{x}),- divide start_ARG ∂ italic_ϕ end_ARG start_ARG ∂ italic_t end_ARG = over¯ start_ARG italic_l end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + ( italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( over¯ start_ARG caligraphic_F end_ARG - over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ) , (46)

    with terminal condition ϕ(T,x)=cT(x).italic-ϕ𝑇𝑥subscript𝑐𝑇𝑥\phi(T,x)=c_{T}(x).italic_ϕ ( italic_T , italic_x ) = italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x ) .

    Next, let φ(t,x)𝜑𝑡𝑥\varphi(t,x)italic_φ ( italic_t , italic_x ) denote the cost-to-go of the deterministic policy ud()superscript𝑢𝑑u^{d}(\cdot)italic_u start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ( ⋅ ) when applied to the stochastic system, i.e., Eq. (1) with ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0. Then, the cost-to-go of the deterministic policy, when applied to the stochastic system, satisfies:

    φt=𝜑𝑡absent\displaystyle-\frac{\partial\varphi}{\partial t}=- divide start_ARG ∂ italic_φ end_ARG start_ARG ∂ italic_t end_ARG = l¯+12(ϕx)𝖳𝒢¯R𝟣𝒢¯𝖳ϕx+(φx)𝖳(¯𝒢¯R𝟣𝒢¯𝖳ϕx)¯𝑙12superscriptsuperscriptitalic-ϕ𝑥𝖳¯𝒢superscript𝑅1superscript¯𝒢𝖳superscriptitalic-ϕ𝑥superscriptsuperscript𝜑𝑥𝖳¯¯𝒢superscript𝑅1superscript¯𝒢𝖳superscriptitalic-ϕ𝑥\displaystyle\bar{l}+\frac{1}{2}{(\phi^{x})}^{\mathsf{T}}\bar{\mathcal{G}}{R}^% {\mathsf{-1}}{\bar{\mathcal{G}}}^{\mathsf{T}}\phi^{x}+{(\varphi^{x})}^{\mathsf% {T}}(\bar{\mathcal{F}}-\bar{\mathcal{G}}{R}^{\mathsf{-1}}{\bar{\mathcal{G}}}^{% \mathsf{T}}\phi^{x})over¯ start_ARG italic_l end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + ( italic_φ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( over¯ start_ARG caligraphic_F end_ARG - over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT )
    +ϵ22tr(φxx),superscriptitalic-ϵ22𝑡𝑟superscript𝜑𝑥𝑥\displaystyle+\frac{\epsilon^{2}}{2}tr(\varphi^{xx}),+ divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_t italic_r ( italic_φ start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT ) , (47)

    with terminal condition φ(T,x)=cT(x)𝜑𝑇𝑥subscript𝑐𝑇𝑥\varphi(T,x)=c_{T}(x)italic_φ ( italic_T , italic_x ) = italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x ). From Proposition 1, we know φ=φ0+ϵ2φ1+ϵ4φ2+𝜑superscript𝜑0superscriptitalic-ϵ2superscript𝜑1superscriptitalic-ϵ4superscript𝜑2\varphi=\varphi^{0}+\epsilon^{2}\varphi^{1}+\epsilon^{4}\varphi^{2}+\cdotsitalic_φ = italic_φ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_φ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_φ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯. Substituting this in Eq. (A-D) gives

    φ0tϵ2φ1tϵ4φ2t+=l¯+12(ϕx)𝖳𝒢¯R𝟣𝒢¯𝖳ϕxsuperscript𝜑0𝑡superscriptitalic-ϵ2superscript𝜑1𝑡superscriptitalic-ϵ4superscript𝜑2𝑡¯𝑙12superscriptsuperscriptitalic-ϕ𝑥𝖳¯𝒢superscript𝑅1superscript¯𝒢𝖳superscriptitalic-ϕ𝑥\displaystyle-\frac{\partial\varphi^{0}}{\partial t}-\epsilon^{2}\frac{% \partial\varphi^{1}}{\partial t}-\epsilon^{4}\frac{\partial\varphi^{2}}{% \partial t}+\cdots=\bar{l}+\frac{1}{2}{(\phi^{x})}^{\mathsf{T}}\bar{\mathcal{G% }}{R}^{\mathsf{-1}}{\bar{\mathcal{G}}}^{\mathsf{T}}\phi^{x}- divide start_ARG ∂ italic_φ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_t end_ARG - italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG ∂ italic_φ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_t end_ARG - italic_ϵ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT divide start_ARG ∂ italic_φ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_t end_ARG + ⋯ = over¯ start_ARG italic_l end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT
    +(φ0,x+ϵ2φ1,x+)𝖳(¯𝒢¯R𝟣𝒢¯𝖳ϕx)superscriptsuperscript𝜑0𝑥superscriptitalic-ϵ2superscript𝜑1𝑥𝖳¯¯𝒢superscript𝑅1superscript¯𝒢𝖳superscriptitalic-ϕ𝑥\displaystyle+{(\varphi^{0,x}+\epsilon^{2}\varphi^{1,x}+\cdots)}^{\mathsf{T}}% \Big{(}\bar{\mathcal{F}}-\bar{\mathcal{G}}{R}^{\mathsf{-1}}{\bar{\mathcal{G}}}% ^{\mathsf{T}}\phi^{x}\Big{)}+ ( italic_φ start_POSTSUPERSCRIPT 0 , italic_x end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_φ start_POSTSUPERSCRIPT 1 , italic_x end_POSTSUPERSCRIPT + ⋯ ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( over¯ start_ARG caligraphic_F end_ARG - over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT )
    +ϵ22tr(φ0,xx+ϵ2φ1,xx+).superscriptitalic-ϵ22𝑡𝑟superscript𝜑0𝑥𝑥superscriptitalic-ϵ2superscript𝜑1𝑥𝑥\displaystyle+\frac{\epsilon^{2}}{2}tr(\varphi^{0,xx}+\epsilon^{2}\varphi^{1,% xx}+\cdots).+ divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG italic_t italic_r ( italic_φ start_POSTSUPERSCRIPT 0 , italic_x italic_x end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_φ start_POSTSUPERSCRIPT 1 , italic_x italic_x end_POSTSUPERSCRIPT + ⋯ ) . (48)

    As before, if we gather the terms for ϵ0superscriptitalic-ϵ0\epsilon^{0}italic_ϵ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, ϵ2superscriptitalic-ϵ2\epsilon^{2}italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, etc., on both sides of the above equation, we shall get the equations governing φ0,φ1superscript𝜑0superscript𝜑1\varphi^{0},\varphi^{1}italic_φ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_φ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT, etc. First, looking at the ϵ0superscriptitalic-ϵ0\epsilon^{0}italic_ϵ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT term in Eq. (48), we obtain:

    φ0t=l¯+12(ϕx)𝖳𝒢¯R𝟣𝒢¯𝖳ϕx+(φ0,x)𝖳(¯𝒢¯R𝟣𝒢¯𝖳ϕx),superscript𝜑0𝑡¯𝑙12superscriptsuperscriptitalic-ϕ𝑥𝖳¯𝒢superscript𝑅1superscript¯𝒢𝖳superscriptitalic-ϕ𝑥superscriptsuperscript𝜑0𝑥𝖳¯¯𝒢superscript𝑅1superscript¯𝒢𝖳superscriptitalic-ϕ𝑥-\frac{\partial\varphi^{0}}{\partial t}=\bar{l}+\frac{1}{2}{(\phi^{x})}^{% \mathsf{T}}\bar{\mathcal{G}}{R}^{\mathsf{-1}}{\bar{\mathcal{G}}}^{\mathsf{T}}% \phi^{x}+{(\varphi^{0,x})}^{\mathsf{T}}(\bar{\mathcal{F}}-\bar{\mathcal{G}}{R}% ^{\mathsf{-1}}{\bar{\mathcal{G}}}^{\mathsf{T}}\phi^{x}),- divide start_ARG ∂ italic_φ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_t end_ARG = over¯ start_ARG italic_l end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + ( italic_φ start_POSTSUPERSCRIPT 0 , italic_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( over¯ start_ARG caligraphic_F end_ARG - over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ) , (49)

    with the terminal condition φ0(T,x)=cT(x)superscript𝜑0𝑇𝑥subscript𝑐𝑇𝑥\varphi^{0}(T,x)=c_{T}(x)italic_φ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_T , italic_x ) = italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x ).

    Comparing Eqs. (49) and (46), it follows that ϕ(t,x)=φ0(t,x)italic-ϕ𝑡𝑥superscript𝜑0𝑡𝑥\phi(t,x)=\varphi^{0}(t,x)italic_ϕ ( italic_t , italic_x ) = italic_φ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_t , italic_x ) for all (t,x)𝑡𝑥(t,x)( italic_t , italic_x ). Further, comparing them to Eq. (A-D), it follows that φ0(t,x)=J0(t,x)superscript𝜑0𝑡𝑥superscript𝐽0𝑡𝑥\varphi^{0}(t,x)=J^{0}(t,x)italic_φ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_t , italic_x ) = italic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_t , italic_x ), for all (t,x)𝑡𝑥(t,x)( italic_t , italic_x ). Also, note that the closed-loop system above, ¯𝒢¯R𝟣𝒢¯𝖳ϕx=f¯0¯¯𝒢superscript𝑅1superscript¯𝒢𝖳superscriptitalic-ϕ𝑥superscript¯𝑓0\bar{\mathcal{F}}-\bar{\mathcal{G}}{R}^{\mathsf{-1}}{\bar{\mathcal{G}}}^{% \mathsf{T}}\phi^{x}=\bar{f}^{0}over¯ start_ARG caligraphic_F end_ARG - over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = over¯ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT (see Eq. (A-D) and (45)).

    Next, consider the ϵ2superscriptitalic-ϵ2\epsilon^{2}italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT terms in Eq. (48). We obtain:

    φ1t=(φ1,x)𝖳(¯𝒢¯R𝟣𝒢¯𝖳ϕx)f¯0+12tr(φ0,xx),superscript𝜑1𝑡superscriptsuperscript𝜑1𝑥𝖳subscript¯¯𝒢superscript𝑅1superscript¯𝒢𝖳superscriptitalic-ϕ𝑥superscript¯𝑓012𝑡𝑟superscript𝜑0𝑥𝑥\displaystyle-\frac{\partial\varphi^{1}}{\partial t}={(\varphi^{1,x})}^{% \mathsf{T}}\underbrace{(\bar{\mathcal{F}}-\bar{\mathcal{G}}{R}^{\mathsf{-1}}{% \bar{\mathcal{G}}}^{\mathsf{T}}\phi^{x})}_{\bar{f}^{0}}+\frac{1}{2}tr(\varphi^% {0,xx}),- divide start_ARG ∂ italic_φ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_t end_ARG = ( italic_φ start_POSTSUPERSCRIPT 1 , italic_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT under⏟ start_ARG ( over¯ start_ARG caligraphic_F end_ARG - over¯ start_ARG caligraphic_G end_ARG italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT over¯ start_ARG caligraphic_G end_ARG start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ) end_ARG start_POSTSUBSCRIPT over¯ start_ARG italic_f end_ARG start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_t italic_r ( italic_φ start_POSTSUPERSCRIPT 0 , italic_x italic_x end_POSTSUPERSCRIPT ) , (50)

    with terminal condition φ1(T,x)=0superscript𝜑1𝑇𝑥0\varphi^{1}(T,x)=0italic_φ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_T , italic_x ) = 0. Again, comparing Eq. (50) to Eq. (45), and noting that φ0=J0superscript𝜑0superscript𝐽0\varphi^{0}=J^{0}italic_φ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = italic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, it follows that φ1(t,x)=J1(t,x)superscript𝜑1𝑡𝑥superscript𝐽1𝑡𝑥\varphi^{1}(t,x)=J^{1}(t,x)italic_φ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_t , italic_x ) = italic_J start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( italic_t , italic_x ), for all (t,x)𝑡𝑥(t,x)( italic_t , italic_x ). This completes the proof of the result.  \hfill\blacksquare

    The result above has used the fact that the noise sequence wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is white. However, this is not necessary to show that J0(t,x)=φ0(t,x)superscript𝐽0𝑡𝑥superscript𝜑0𝑡𝑥J^{0}(t,x)=\varphi^{0}(t,x)italic_J start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_t , italic_x ) = italic_φ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( italic_t , italic_x ) for all (t,x)𝑡𝑥(t,x)( italic_t , italic_x ).

    A-E Proof of Proposition 3

    Proof:

    Let the system model be given as x˙=(x)+𝒢(x)u˙𝑥𝑥𝒢𝑥𝑢\dot{x}=\mathcal{F}(x)+\mathcal{G}(x)uover˙ start_ARG italic_x end_ARG = caligraphic_F ( italic_x ) + caligraphic_G ( italic_x ) italic_u where, the system matrices, its Jacobians, and Hessians are defined as in Definition 1.
    Using indicial notation, the Lagrange-Charpit equations are (the subscript t𝑡titalic_t is ignored for the sake of simplicity):

    x˙isubscript˙𝑥𝑖\displaystyle\dot{x}_{i}over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =fi(x)ΓijRjm𝟣Γmnqn,absentsubscript𝑓𝑖𝑥superscriptsubscriptΓ𝑖𝑗subscriptsuperscript𝑅1𝑗𝑚superscriptsubscriptΓ𝑚𝑛subscript𝑞𝑛\displaystyle=f_{i}(x)-\Gamma_{i}^{j}{R}^{\mathsf{-1}}_{jm}\Gamma_{m}^{n}q_{n},= italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) - roman_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_m end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , (51)
    q˙isubscript˙𝑞𝑖\displaystyle\dot{q}_{i}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =Lixfijxqj+qnΓmnRlm𝟣Γikl,xqk.absentsuperscriptsubscript𝐿𝑖𝑥superscriptsubscript𝑓𝑖𝑗𝑥subscript𝑞𝑗subscript𝑞𝑛superscriptsubscriptΓ𝑚𝑛subscriptsuperscript𝑅1𝑙𝑚subscriptsuperscriptΓ𝑙𝑥𝑖𝑘subscript𝑞𝑘\displaystyle=-L_{i}^{x}-f_{ij}^{x}q_{j}+q_{n}\Gamma_{m}^{n}{R}^{\mathsf{-1}}_% {lm}\Gamma^{l,x}_{ik}q_{k}.= - italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT roman_Γ start_POSTSUPERSCRIPT italic_l , italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT . (52)

    Performing a perturbation expansion of x˙˙𝑥\dot{x}over˙ start_ARG italic_x end_ARG around a nominal trajectory x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG gives

    δx˙i=(fijxΓijk,xRkm𝟣ΓmnqnΓkiRkm𝟣Γnjm,xqn)δxj+12(fijkxxRlm𝟣ΓmnqnΓikjl,xxΓikl,xRlm𝟣Γnjm,xqnΓilRlm𝟣Γnkjm,xxqn)δxkδxjΓijRjm𝟣Γmnδqn+H~(δx3)+S~(δq2).𝛿subscript˙𝑥𝑖superscriptsubscript𝑓𝑖𝑗𝑥superscriptsubscriptΓ𝑖𝑗𝑘𝑥subscriptsuperscript𝑅1𝑘𝑚superscriptsubscriptΓ𝑚𝑛subscript𝑞𝑛superscriptsubscriptΓ𝑘𝑖subscriptsuperscript𝑅1𝑘𝑚superscriptsubscriptΓ𝑛𝑗𝑚𝑥subscript𝑞𝑛𝛿subscript𝑥𝑗12superscriptsubscript𝑓𝑖𝑗𝑘𝑥𝑥subscriptsuperscript𝑅1𝑙𝑚superscriptsubscriptΓ𝑚𝑛subscript𝑞𝑛superscriptsubscriptΓ𝑖𝑘𝑗𝑙𝑥𝑥superscriptsubscriptΓ𝑖𝑘𝑙𝑥subscriptsuperscript𝑅1𝑙𝑚superscriptsubscriptΓ𝑛𝑗𝑚𝑥subscript𝑞𝑛superscriptsubscriptΓ𝑖𝑙subscriptsuperscript𝑅1𝑙𝑚superscriptsubscriptΓ𝑛𝑘𝑗𝑚𝑥𝑥subscript𝑞𝑛𝛿subscript𝑥𝑘𝛿subscript𝑥𝑗superscriptsubscriptΓ𝑖𝑗subscriptsuperscript𝑅1𝑗𝑚superscriptsubscriptΓ𝑚𝑛𝛿subscript𝑞𝑛~𝐻𝛿superscript𝑥3~𝑆𝛿superscript𝑞2\delta\dot{x}_{i}=(f_{ij}^{x}-\Gamma_{ij}^{k,x}{R}^{\mathsf{-1}}_{km}\Gamma_{m% }^{n}q_{n}-\Gamma_{k}^{i}{R}^{\mathsf{-1}}_{km}\Gamma_{nj}^{m,x}q_{n})\delta x% _{j}+\frac{1}{2}(f_{ijk}^{xx}-\\ {R}^{\mathsf{-1}}_{lm}\Gamma_{m}^{n}q_{n}\Gamma_{ikj}^{l,xx}-\Gamma_{ik}^{l,x}% {R}^{\mathsf{-1}}_{lm}\Gamma_{nj}^{m,x}q_{n}-\Gamma_{i}^{l}{R}^{\mathsf{-1}}_{% lm}\Gamma_{nkj}^{m,xx}q_{n})\delta x_{k}\delta x_{j}\\ -\Gamma_{i}^{j}{R}^{\mathsf{-1}}_{jm}\Gamma_{m}^{n}\delta q_{n}+\tilde{H}(% \delta x^{3})+\tilde{S}(\delta q^{2}).start_ROW start_CELL italic_δ over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_f start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - roman_Γ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k , italic_x end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_m end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - roman_Γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_m end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_n italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_x end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) italic_δ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_f start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT - end_CELL end_ROW start_ROW start_CELL italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_i italic_k italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x italic_x end_POSTSUPERSCRIPT - roman_Γ start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_n italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_x end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - roman_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_n italic_k italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_x italic_x end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL - roman_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_m end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_δ italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + over~ start_ARG italic_H end_ARG ( italic_δ italic_x start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) + over~ start_ARG italic_S end_ARG ( italic_δ italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . end_CELL end_ROW (53)

    Expanding the co-states about the nominal gives

    qisubscript𝑞𝑖\displaystyle q_{i}italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =Gi+Pijδxj+H(δx2),absentsubscript𝐺𝑖subscript𝑃𝑖𝑗𝛿subscript𝑥𝑗𝐻𝛿superscript𝑥2\displaystyle=G_{i}+P_{ij}\delta x_{j}+H(\delta x^{2}),= italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_H ( italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (54)
    δqi𝛿subscript𝑞𝑖\displaystyle\delta q_{i}italic_δ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =Pijδxj+H(δx2).absentsubscript𝑃𝑖𝑗𝛿subscript𝑥𝑗𝐻𝛿superscript𝑥2\displaystyle=P_{ij}\delta x_{j}+H(\delta x^{2}).= italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_H ( italic_δ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . (55)

    Substituting Eq. (54) and (55) in Eq. (53), we get

    δx˙i𝛿subscript˙𝑥𝑖\displaystyle\delta\dot{x}_{i}italic_δ over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =(f¯ijxΓ¯ijk,xRkm𝟣Γ¯mnqnΓ¯kiRkm𝟣Γ¯njm,xqn\displaystyle=(\bar{f}_{ij}^{x}-\bar{\Gamma}_{ij}^{k,x}{R}^{\mathsf{-1}}_{km}% \bar{\Gamma}_{m}^{n}q_{n}-\bar{\Gamma}_{k}^{i}{R}^{\mathsf{-1}}_{km}\bar{% \Gamma}_{nj}^{m,x}q_{n}= ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k , italic_x end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_n italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_x end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT
    Γ¯ilRlm𝟣Γ¯mnPnj)δxj+H.O.T.\displaystyle-\bar{\Gamma}_{i}^{l}{R}^{\mathsf{-1}}_{lm}\bar{\Gamma}_{m}^{n}P_% {nj})\delta x_{j}+H.O.T.- over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_n italic_j end_POSTSUBSCRIPT ) italic_δ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_H . italic_O . italic_T . (56)

    Let ij=f¯ijxΓ¯ijk,xRkm𝟣Γ¯mnqnΓ¯kiRkm𝟣Γ¯njm,xqnΓ¯ilRlm𝟣Γ¯mnPnj.subscript𝑖𝑗superscriptsubscript¯𝑓𝑖𝑗𝑥superscriptsubscript¯Γ𝑖𝑗𝑘𝑥subscriptsuperscript𝑅1𝑘𝑚superscriptsubscript¯Γ𝑚𝑛subscript𝑞𝑛superscriptsubscript¯Γ𝑘𝑖subscriptsuperscript𝑅1𝑘𝑚superscriptsubscript¯Γ𝑛𝑗𝑚𝑥subscript𝑞𝑛superscriptsubscript¯Γ𝑖𝑙subscriptsuperscript𝑅1𝑙𝑚superscriptsubscript¯Γ𝑚𝑛subscript𝑃𝑛𝑗\mathcal{M}_{ij}=\bar{f}_{ij}^{x}-\bar{\Gamma}_{ij}^{k,x}{R}^{\mathsf{-1}}_{km% }\bar{\Gamma}_{m}^{n}q_{n}-\bar{\Gamma}_{k}^{i}{R}^{\mathsf{-1}}_{km}\bar{% \Gamma}_{nj}^{m,x}q_{n}-\bar{\Gamma}_{i}^{l}{R}^{\mathsf{-1}}_{lm}\bar{\Gamma}% _{m}^{n}P_{nj}.caligraphic_M start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k , italic_x end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_n italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_x end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_n italic_j end_POSTSUBSCRIPT . Differentiating Eq. (54) and using Eq. (56), we get

    q˙isubscript˙𝑞𝑖\displaystyle\dot{q}_{i}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =G˙i+Pijδx˙j+P˙ijδxj+,absentsubscript˙𝐺𝑖subscript𝑃𝑖𝑗𝛿subscript˙𝑥𝑗subscript˙𝑃𝑖𝑗𝛿subscript𝑥𝑗\displaystyle=\dot{G}_{i}+P_{ij}\delta\dot{x}_{j}+\dot{P}_{ij}\delta x_{j}+\cdots,= over˙ start_ARG italic_G end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_δ over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + over˙ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + ⋯ , (57)
    q˙isubscript˙𝑞𝑖\displaystyle\dot{q}_{i}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =G˙i+Pij(jkδxk+)+P˙ijδxj+.absentsubscript˙𝐺𝑖subscript𝑃𝑖𝑗subscript𝑗𝑘𝛿subscript𝑥𝑘subscript˙𝑃𝑖𝑗𝛿subscript𝑥𝑗\displaystyle=\dot{G}_{i}+P_{ij}(\mathcal{M}_{jk}\delta x_{k}+\cdots)+\dot{P}_% {ij}\delta x_{j}+\cdots.= over˙ start_ARG italic_G end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( caligraphic_M start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ⋯ ) + over˙ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + ⋯ . (58)

    Expanding Eq. (52) upto 1st order about a nominal trajectory and substituting Eq. (54),

    q˙isubscript˙𝑞𝑖\displaystyle\dot{q}_{i}over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =(L¯ixLijxxδxj+)f¯ijx(Gj+Pjkδxk+)absentsuperscriptsubscript¯𝐿𝑖𝑥superscriptsubscript𝐿𝑖𝑗𝑥𝑥𝛿subscript𝑥𝑗superscriptsubscript¯𝑓𝑖𝑗𝑥subscript𝐺𝑗subscript𝑃𝑗𝑘𝛿subscript𝑥𝑘\displaystyle=-(\bar{L}_{i}^{x}-L_{ij}^{xx}\delta{x}_{j}+\cdots)-\bar{f}_{ij}^% {x}(G_{j}+P_{jk}\delta x_{k}+\cdots)= - ( over¯ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - italic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + ⋯ ) - over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ⋯ )
    δxmf¯ijmxx(Gj+Pijδxk+)+(Gn+Pnkδxk+)𝛿subscript𝑥𝑚superscriptsubscript¯𝑓𝑖𝑗𝑚𝑥𝑥subscript𝐺𝑗subscript𝑃𝑖𝑗𝛿subscript𝑥𝑘subscript𝐺𝑛subscript𝑃𝑛𝑘𝛿subscript𝑥𝑘\displaystyle-\delta x_{m}\bar{f}_{ijm}^{xx}(G_{j}+P_{ij}\delta x_{k}+\cdots)+% (G_{n}+P_{nk}\delta x_{k}+\cdots)- italic_δ italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i italic_j italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ⋯ ) + ( italic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_n italic_k end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + ⋯ )
    ×(Γ¯mnRlm𝟣Γ¯ipl,x+Γ¯mjn,xδxjRlm𝟣Γ¯ipl,x+Γ¯mnRlm𝟣Γ¯ipjl,xxδxj+)absentsuperscriptsubscript¯Γ𝑚𝑛subscriptsuperscript𝑅1𝑙𝑚superscriptsubscript¯Γ𝑖𝑝𝑙𝑥superscriptsubscript¯Γ𝑚𝑗𝑛𝑥𝛿subscript𝑥𝑗subscriptsuperscript𝑅1𝑙𝑚superscriptsubscript¯Γ𝑖𝑝𝑙𝑥superscriptsubscript¯Γ𝑚𝑛subscriptsuperscript𝑅1𝑙𝑚superscriptsubscript¯Γ𝑖𝑝𝑗𝑙𝑥𝑥𝛿subscript𝑥𝑗\displaystyle\times(\bar{\Gamma}_{m}^{n}{R}^{\mathsf{-1}}_{lm}\bar{\Gamma}_{ip% }^{l,x}+\bar{\Gamma}_{mj}^{n,x}\delta x_{j}{R}^{\mathsf{-1}}_{lm}\bar{\Gamma}_% {ip}^{l,x}+\bar{\Gamma}_{m}^{n}{R}^{\mathsf{-1}}_{lm}\bar{\Gamma}_{ipj}^{l,xx}% \delta x_{j}+\cdots)× ( over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x end_POSTSUPERSCRIPT + over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n , italic_x end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x end_POSTSUPERSCRIPT + over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i italic_p italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x italic_x end_POSTSUPERSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + ⋯ )
    ×(Gp+Pprδxr+).absentsubscript𝐺𝑝subscript𝑃𝑝𝑟𝛿subscript𝑥𝑟\displaystyle\times(G_{p}+P_{pr}\delta x_{r}+\cdots).× ( italic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT italic_δ italic_x start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + ⋯ ) . (59)

    Comparing the terms up to 1st order in δx𝛿𝑥\delta xitalic_δ italic_x in Eq. (58) and Eq. (59) with appropriate change in indices, we get

    G˙i=L¯ixf¯ijxGj+GnΓ¯mnRlm𝟣Γ¯ipl,xGp,subscript˙𝐺𝑖superscriptsubscript¯𝐿𝑖𝑥superscriptsubscript¯𝑓𝑖𝑗𝑥subscript𝐺𝑗subscript𝐺𝑛superscriptsubscript¯Γ𝑚𝑛subscriptsuperscript𝑅1𝑙𝑚superscriptsubscript¯Γ𝑖𝑝𝑙𝑥subscript𝐺𝑝\displaystyle\dot{G}_{i}=-\bar{L}_{i}^{x}-\bar{f}_{ij}^{x}G_{j}+G_{n}\bar{% \Gamma}_{m}^{n}{R}^{\mathsf{-1}}_{lm}\bar{\Gamma}_{ip}^{l,x}G_{p},over˙ start_ARG italic_G end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = - over¯ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , (60)
    P˙ij=Pik(f¯kjxΓ¯kjl,xRlm𝟣Γ¯mnGn)(f¯ikxGnΓ¯mnRlm𝟣Γ¯ikl,x)Pkjsubscript˙𝑃𝑖𝑗subscript𝑃𝑖𝑘superscriptsubscript¯𝑓𝑘𝑗𝑥superscriptsubscript¯Γ𝑘𝑗𝑙𝑥subscriptsuperscript𝑅1𝑙𝑚superscriptsubscript¯Γ𝑚𝑛subscript𝐺𝑛superscriptsubscript¯𝑓𝑖𝑘𝑥subscript𝐺𝑛superscriptsubscript¯Γ𝑚𝑛subscriptsuperscript𝑅1𝑙𝑚superscriptsubscript¯Γ𝑖𝑘𝑙𝑥subscript𝑃𝑘𝑗\displaystyle\dot{P}_{ij}=-P_{ik}(\bar{f}_{kj}^{x}-\bar{\Gamma}_{kj}^{l,x}{R}^% {\mathsf{-1}}_{lm}\bar{\Gamma}_{m}^{n}G_{n})-(\bar{f}_{ik}^{x}-G_{n}\bar{% \Gamma}_{m}^{n}{R}^{\mathsf{-1}}_{lm}\bar{\Gamma}_{ik}^{l,x})P_{kj}over˙ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = - italic_P start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - italic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x end_POSTSUPERSCRIPT ) italic_P start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT
    Lijxx(f¯ipjxxGnΓ¯mnRlm𝟣Γ¯ipjl,xx)Gp+PikΓ¯lkRlm𝟣Γ¯njm,xGnsuperscriptsubscript𝐿𝑖𝑗𝑥𝑥superscriptsubscript¯𝑓𝑖𝑝𝑗𝑥𝑥subscript𝐺𝑛superscriptsubscript¯Γ𝑚𝑛subscriptsuperscript𝑅1𝑙𝑚superscriptsubscript¯Γ𝑖𝑝𝑗𝑙𝑥𝑥subscript𝐺𝑝subscript𝑃𝑖𝑘superscriptsubscript¯Γ𝑙𝑘subscriptsuperscript𝑅1𝑙𝑚superscriptsubscript¯Γ𝑛𝑗𝑚𝑥subscript𝐺𝑛\displaystyle-L_{ij}^{xx}-(\bar{f}_{ipj}^{xx}-G_{n}\bar{\Gamma}_{m}^{n}{R}^{% \mathsf{-1}}_{lm}\bar{\Gamma}_{ipj}^{l,xx})G_{p}+P_{ik}\bar{\Gamma}_{l}^{k}{R}% ^{\mathsf{-1}}_{lm}\bar{\Gamma}_{nj}^{m,x}G_{n}- italic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT - ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i italic_p italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT - italic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i italic_p italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x italic_x end_POSTSUPERSCRIPT ) italic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_n italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_x end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT
    +PikΓ¯klRlm𝟣Γ¯mnPnj+PnjΓ¯mnRlm𝟣Γ¯ipl,xGpsubscript𝑃𝑖𝑘superscriptsubscript¯Γ𝑘𝑙subscriptsuperscript𝑅1𝑙𝑚superscriptsubscript¯Γ𝑚𝑛subscript𝑃𝑛𝑗subscript𝑃𝑛𝑗superscriptsubscript¯Γ𝑚𝑛subscriptsuperscript𝑅1𝑙𝑚superscriptsubscript¯Γ𝑖𝑝𝑙𝑥subscript𝐺𝑝\displaystyle+P_{ik}\bar{\Gamma}_{k}^{l}{R}^{\mathsf{-1}}_{lm}\bar{\Gamma}_{m}% ^{n}P_{nj}+P_{nj}\bar{\Gamma}_{m}^{n}{R}^{\mathsf{-1}}_{lm}\bar{\Gamma}_{ip}^{% l,x}G_{p}+ italic_P start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_n italic_j end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_n italic_j end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT
    +GnΓ¯mjn,xRlm𝟣Γ¯ipl,xGp.subscript𝐺𝑛superscriptsubscript¯Γ𝑚𝑗𝑛𝑥subscriptsuperscript𝑅1𝑙𝑚superscriptsubscript¯Γ𝑖𝑝𝑙𝑥subscript𝐺𝑝\displaystyle+G_{n}\bar{\Gamma}_{mj}^{n,x}{R}^{\mathsf{-1}}_{lm}\bar{\Gamma}_{% ip}^{l,x}G_{p}.+ italic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n , italic_x end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT . (61)

    Substituting u¯l=Rlm𝟣Γ¯mnGnsubscript¯𝑢𝑙subscriptsuperscript𝑅1𝑙𝑚superscriptsubscript¯Γ𝑚𝑛subscript𝐺𝑛\bar{u}_{l}=-{R}^{\mathsf{-1}}_{lm}\bar{\Gamma}_{m}^{n}G_{n}over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = - italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and changing indices to group terms, Eq. (60) can be written as G˙i=L¯ix(f¯ijx+u¯lΓ¯ijl,x)Gj,subscript˙𝐺𝑖superscriptsubscript¯𝐿𝑖𝑥superscriptsubscript¯𝑓𝑖𝑗𝑥subscript¯𝑢𝑙superscriptsubscript¯Γ𝑖𝑗𝑙𝑥subscript𝐺𝑗\dot{G}_{i}=-\bar{L}_{i}^{x}-(\bar{f}_{ij}^{x}+\bar{u}_{l}\bar{\Gamma}_{ij}^{l% ,x})G_{j},over˙ start_ARG italic_G end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = - over¯ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x end_POSTSUPERSCRIPT ) italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , whose vector form is Eq. (27). Similarly, u¯lsubscript¯𝑢𝑙\bar{u}_{l}over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT can be substituted in Eq. (61) and can be written as

    P˙ijsubscript˙𝑃𝑖𝑗\displaystyle\dot{P}_{ij}over˙ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT =Pik(f¯kjx+Γ¯kjl,xu¯l)(f¯ikx+u¯lΓ¯ikl,x)PkjLijxxabsentsubscript𝑃𝑖𝑘superscriptsubscript¯𝑓𝑘𝑗𝑥superscriptsubscript¯Γ𝑘𝑗𝑙𝑥subscript¯𝑢𝑙superscriptsubscript¯𝑓𝑖𝑘𝑥subscript¯𝑢𝑙superscriptsubscript¯Γ𝑖𝑘𝑙𝑥subscript𝑃𝑘𝑗superscriptsubscript𝐿𝑖𝑗𝑥𝑥\displaystyle=-P_{ik}(\bar{f}_{kj}^{x}+\bar{\Gamma}_{kj}^{l,x}\bar{u}_{l})-(% \bar{f}_{ik}^{x}+\bar{u}_{l}\bar{\Gamma}_{ik}^{l,x})P_{kj}-L_{ij}^{xx}= - italic_P start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x end_POSTSUPERSCRIPT over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) - ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x end_POSTSUPERSCRIPT ) italic_P start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT
    (f¯ipjxxu¯lΓ¯ipjl,xx)Gp+KliRlmKmj,superscriptsubscript¯𝑓𝑖𝑝𝑗𝑥𝑥subscript¯𝑢𝑙superscriptsubscript¯Γ𝑖𝑝𝑗𝑙𝑥𝑥subscript𝐺𝑝subscript𝐾𝑙𝑖subscript𝑅𝑙𝑚subscript𝐾𝑚𝑗\displaystyle-(\bar{f}_{ipj}^{xx}-\bar{u}_{l}\bar{\Gamma}_{ipj}^{l,xx})G_{p}+K% _{li}R_{lm}K_{mj},- ( over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i italic_p italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT - over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_i italic_p italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l , italic_x italic_x end_POSTSUPERSCRIPT ) italic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT + italic_K start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT italic_m italic_j end_POSTSUBSCRIPT , (62)
    where, Kijwhere, subscript𝐾𝑖𝑗\displaystyle\text{where, }K_{ij}where, italic_K start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT =Rim𝟣(Γ¯kmPkj+Γ¯kjm,xGk).absentsubscriptsuperscript𝑅1𝑖𝑚superscriptsubscript¯Γ𝑘𝑚subscript𝑃𝑘𝑗superscriptsubscript¯Γ𝑘𝑗𝑚𝑥subscript𝐺𝑘\displaystyle=-{R}^{\mathsf{-1}}_{im}(\bar{\Gamma}_{k}^{m}P_{kj}+\bar{\Gamma}_% {kj}^{m,x}G_{k}).= - italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_m end_POSTSUBSCRIPT ( over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT + over¯ start_ARG roman_Γ end_ARG start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_x end_POSTSUPERSCRIPT italic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) . (63)

    Eq. (28) and Eq. (29) are the vector form of Eq. (62) and Eq. (63) respectively. \hfill\blacksquare

    A-F Proof of Proposition 4

    Proof:

    We show the scalar case, the vector case is a straightforward extension. We need the function F(t,x,p,q,J)=p+l12g2rq2+fq𝐹𝑡𝑥𝑝𝑞𝐽𝑝𝑙12superscript𝑔2𝑟superscript𝑞2𝑓𝑞F(t,x,p,q,J)=p+l-\frac{1}{2}\frac{g^{2}}{r}q^{2}+fqitalic_F ( italic_t , italic_x , italic_p , italic_q , italic_J ) = italic_p + italic_l - divide start_ARG 1 end_ARG start_ARG 2 end_ARG divide start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_r end_ARG italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_f italic_q to be 𝒞2superscript𝒞2\mathcal{C}^{2}caligraphic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in all its arguments for unique characteristic curves, i.e., characteristic curves that do not intersect, since then the functions Fxsubscript𝐹𝑥F_{x}italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and Fqsubscript𝐹𝑞F_{q}italic_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT are 𝒞1superscript𝒞1\mathcal{C}^{1}caligraphic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT, and therefore Lipschitz continuous. From the existence and uniqueness results of ODEs [nonlinear_systems_khalil, Ch. 3.1], it follows that the Lagrange-Charpit characteristic ODEs x˙=Fq˙𝑥subscript𝐹𝑞\dot{x}=F_{q}over˙ start_ARG italic_x end_ARG = italic_F start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, q˙=FxqFJ˙𝑞subscript𝐹𝑥𝑞subscript𝐹𝐽\dot{q}=-F_{x}-qF_{J}over˙ start_ARG italic_q end_ARG = - italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - italic_q italic_F start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT, are Lipschitz continuous in their right hand side functions, and therefore, have unique solutions in the interval [0,T]0𝑇[0,T][ 0 , italic_T ]. Moreover, the state x𝑥xitalic_x and co-state q=Jx𝑞superscript𝐽𝑥q=J^{x}italic_q = italic_J start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT vary continuously with respect to the terminal condition xTsubscript𝑥𝑇x_{T}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT at any time t𝑡titalic_t. Let us denote qT=cTx(xT)ϕT(xT)subscript𝑞𝑇superscriptsubscript𝑐𝑇𝑥subscript𝑥𝑇subscriptitalic-ϕ𝑇subscript𝑥𝑇q_{T}=c_{T}^{x}(x_{T})\equiv\phi_{T}(x_{T})italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ≡ italic_ϕ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ). Thus, qTsubscript𝑞𝑇q_{T}italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is a function of xTsubscript𝑥𝑇x_{T}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, i.e., qTsubscript𝑞𝑇q_{T}italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is uniquely determined by the value of xTsubscript𝑥𝑇x_{T}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT.
    Next, we show that under the Lagrange-Charpit equations, the function ϕT(xT)subscriptitalic-ϕ𝑇subscript𝑥𝑇\phi_{T}(x_{T})italic_ϕ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) remains a function, i.e., we can write qt=ϕt(xt)subscript𝑞𝑡subscriptitalic-ϕ𝑡subscript𝑥𝑡q_{t}=\phi_{t}(x_{t})italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), for some suitable smooth function ϕt(.)\phi_{t}(.)italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( . ), for any t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ]. In order to show this, suppose that this is not the case for some t𝑡titalic_t. Then, it is necessary that there exist xtsuperscriptsubscript𝑥𝑡x_{t}^{*}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT such that dqtdxt|xt=±evaluated-at𝑑subscript𝑞𝑡𝑑subscript𝑥𝑡superscriptsubscript𝑥𝑡plus-or-minus\frac{dq_{t}}{dx_{t}}|_{x_{t}^{*}}=\pm\inftydivide start_ARG italic_d italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ± ∞, or equivalently that dxtdqt|xt=0evaluated-at𝑑subscript𝑥𝑡𝑑subscript𝑞𝑡superscriptsubscript𝑥𝑡0\frac{dx_{t}}{dq_{t}}|_{x_{t}^{*}}=0divide start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 0 (see Fig. 4). This in turn implies that there exists a terminal condition xTsuperscriptsubscript𝑥𝑇x_{T}^{*}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT such that dxtdxT|xT=0evaluated-at𝑑subscript𝑥𝑡𝑑subscript𝑥𝑇superscriptsubscript𝑥𝑇0\frac{dx_{t}}{dx_{T}}|_{x_{T}^{*}}=0divide start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 0, where the terminal state xTsuperscriptsubscript𝑥𝑇x_{T}^{*}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT maps to the state xtsuperscriptsubscript𝑥𝑡x_{t}^{*}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT under the Lagrange-Charpit equations. We will now show that this is not feasible. Owing to the uniqueness of the solutions of the Lagrange-charpit equations: the Jacobian |xtxTxtqTqtxTqtqT|(xT,qT)0subscriptmatrixsubscript𝑥𝑡subscript𝑥𝑇subscript𝑥𝑡subscript𝑞𝑇subscript𝑞𝑡subscript𝑥𝑇subscript𝑞𝑡subscript𝑞𝑇subscript𝑥𝑇subscript𝑞𝑇0\begin{vmatrix}\frac{\partial x_{t}}{\partial x_{T}}&\frac{\partial x_{t}}{% \partial q_{T}}\\ \frac{\partial q_{t}}{\partial x_{T}}&\frac{\partial q_{t}}{\partial q_{T}}% \end{vmatrix}_{(x_{T},q_{T})}\neq 0| start_ARG start_ROW start_CELL divide start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG end_CELL start_CELL divide start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG end_CELL start_CELL divide start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG end_CELL end_ROW end_ARG | start_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≠ 0, for any (xT,qT)subscript𝑥𝑇subscript𝑞𝑇(x_{T},q_{T})( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ). Thus, for qT=ϕT(xT)subscript𝑞𝑇subscriptitalic-ϕ𝑇subscript𝑥𝑇q_{T}=\phi_{T}(x_{T})italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ), substituting into the above equation implies that:

    xtxTϕT(xT)xtqT0,subscript𝑥𝑡subscript𝑥𝑇superscriptsubscriptitalic-ϕ𝑇subscript𝑥𝑇subscript𝑥𝑡subscript𝑞𝑇0\frac{\partial x_{t}}{\partial x_{T}}-\phi_{T}^{\prime}(x_{T})\frac{\partial x% _{t}}{\partial q_{T}}\neq 0,divide start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG - italic_ϕ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) divide start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG ≠ 0 , (64)

    for any terminal state xTsubscript𝑥𝑇x_{T}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, where ϕT(.)\phi_{T}^{\prime}(.)italic_ϕ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( . ) represents the derivative of the function. Consider now the state xTsuperscriptsubscript𝑥𝑇x_{T}^{*}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, owing to the fact that dxtdxT|xT=0evaluated-at𝑑subscript𝑥𝑡𝑑subscript𝑥𝑇superscriptsubscript𝑥𝑇0\frac{dx_{t}}{dx_{T}}|_{x_{T}^{*}}=0divide start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 0, we obtain that:

    dxtdxT=xtxTdxTdxT+xtqTdqTdxT=xtxT+ϕT(xT)xtqT=0,𝑑subscript𝑥𝑡𝑑subscript𝑥𝑇subscript𝑥𝑡subscript𝑥𝑇𝑑subscript𝑥𝑇𝑑subscript𝑥𝑇subscript𝑥𝑡subscript𝑞𝑇𝑑subscript𝑞𝑇𝑑subscript𝑥𝑇subscript𝑥𝑡subscript𝑥𝑇superscriptsubscriptitalic-ϕ𝑇superscriptsubscript𝑥𝑇subscript𝑥𝑡subscript𝑞𝑇0\frac{dx_{t}}{dx_{T}}=\frac{\partial x_{t}}{\partial x_{T}}\frac{dx_{T}}{dx_{T% }}+\frac{\partial x_{t}}{\partial q_{T}}\frac{dq_{T}}{dx_{T}}=\frac{\partial x% _{t}}{\partial x_{T}}+\phi_{T}^{\prime}(x_{T}^{*})\frac{\partial x_{t}}{% \partial q_{T}}=0,divide start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG = divide start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG divide start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + divide start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG divide start_ARG italic_d italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG = divide start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG + italic_ϕ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) divide start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG = 0 , (65)

    where the partial derivatives are taken at xTsuperscriptsubscript𝑥𝑇x_{T}^{*}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. The above implies that dqTdxT=ϕT(xT)𝑑subscript𝑞𝑇𝑑subscript𝑥𝑇superscriptsubscriptitalic-ϕ𝑇superscriptsubscript𝑥𝑇\frac{dq_{T}}{dx_{T}}=-\phi_{T}^{\prime}(x_{T}^{*})divide start_ARG italic_d italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG = - italic_ϕ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), however, by definition: dqTdxT=ϕT(xT)𝑑subscript𝑞𝑇𝑑subscript𝑥𝑇superscriptsubscriptitalic-ϕ𝑇superscriptsubscript𝑥𝑇\frac{dq_{T}}{dx_{T}}=\phi_{T}^{\prime}(x_{T}^{*})divide start_ARG italic_d italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG = italic_ϕ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), which means that ϕT(xT)=0superscriptsubscriptitalic-ϕ𝑇superscriptsubscript𝑥𝑇0\phi_{T}^{\prime}(x_{T}^{*})=0italic_ϕ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = 0. Owing to Eq. (64), this means that xtxT|xT0evaluated-atsubscript𝑥𝑡subscript𝑥𝑇superscriptsubscript𝑥𝑇0\frac{\partial x_{t}}{\partial x_{T}}|_{x_{T}^{*}}\neq 0divide start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≠ 0. However, using the second equality in Eq. (65), this implies that dxtdxT|xT0evaluated-at𝑑subscript𝑥𝑡𝑑subscript𝑥𝑇superscriptsubscript𝑥𝑇0\frac{dx_{t}}{dx_{T}}|_{x_{T}^{*}}\neq 0divide start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≠ 0, which contradicts the assumption that dxtdxT|xT=0evaluated-at𝑑subscript𝑥𝑡𝑑subscript𝑥𝑇superscriptsubscript𝑥𝑇0\frac{dx_{t}}{dx_{T}}|_{x_{T}^{*}}=0divide start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 0. Thus, it follows that qt=ϕt(xt)subscript𝑞𝑡subscriptitalic-ϕ𝑡subscript𝑥𝑡q_{t}=\phi_{t}(x_{t})italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), for some smooth function ϕt()subscriptitalic-ϕ𝑡\phi_{t}(\cdot)italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ), for any t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ].

    Refer to caption
    xTsubscript𝑥𝑇x_{T}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT
    qTsubscript𝑞𝑇q_{T}italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT
    xTsuperscriptsubscript𝑥𝑇x_{T}^{*}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
    ϕTsubscriptitalic-ϕ𝑇\phi_{T}italic_ϕ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT
    xtsuperscriptsubscript𝑥𝑡x_{t}^{*}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
    Refer to caption
    x¯tsubscript¯𝑥𝑡\bar{x}_{t}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
    qtsubscript𝑞𝑡q_{t}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
    xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
    ϕtsubscriptitalic-ϕ𝑡\phi_{t}italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
    Figure 4: Map** of the terminal conditions under the Lagrange Charpit equations. For two characteristic curves to flow through the same state x¯tsubscript¯𝑥𝑡\bar{x}_{t}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the ϕt()subscriptitalic-ϕ𝑡\phi_{t}(\cdot)italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ) curve has to fold on itself necessitating the existence of a state xtsuperscriptsubscript𝑥𝑡x_{t}^{*}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT such that dxtdqt|xt=0evaluated-at𝑑subscript𝑥𝑡𝑑subscript𝑞𝑡superscriptsubscript𝑥𝑡0\frac{dx_{t}}{dq_{t}}|_{x_{t}^{*}}=0divide start_ARG italic_d italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG | start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 0. Given that the Lagrange-Charpit equations have unique solutions, such a state xtsuperscriptsubscript𝑥𝑡x_{t}^{*}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT cannot exist.

    Next, note that if a characteristic curve flows through the initial state x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, then it means that we have found a terminal state xTsubscript𝑥𝑇x_{T}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, along with the terminal co-state qT=cTx(xT)subscript𝑞𝑇superscriptsubscript𝑐𝑇𝑥subscript𝑥𝑇q_{T}=c_{T}^{x}(x_{T})italic_q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ), that satisfies the Lagrange-Charpit equations. However, this is, by definition, a solution that is found by satisfying the Minimum Principle. Therefore, owing to the development above, the co-state q0=ϕ0(x0)subscript𝑞0subscriptitalic-ϕ0subscript𝑥0q_{0}=\phi_{0}(x_{0})italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_ϕ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) is uniquely determined by the initial state x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and a solution that satisfies the minimum principle is necessarily unique. Moreover, since this solution is the unique characteristic curve of the HJB flowing through x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, it is also the global optimum.
    The arguments made above can be generalized to the vector case where the function F(t,x,J,p,q)𝐹𝑡𝑥𝐽𝑝𝑞F(t,x,J,p,q)italic_F ( italic_t , italic_x , italic_J , italic_p , italic_q ) in the vector case is defined as F(t,x,J,p,q)=p+l12q𝖳𝒢(x)R𝟣𝒢(x)𝖳q+q𝖳(x),𝐹𝑡𝑥𝐽𝑝𝑞𝑝𝑙12superscript𝑞𝖳𝒢𝑥superscript𝑅1𝒢superscript𝑥𝖳𝑞superscript𝑞𝖳𝑥F(t,x,J,p,q)=p+l-\frac{1}{2}{q}^{\mathsf{T}}\mathcal{G}(x){R}^{\mathsf{-1}}{% \mathcal{G}(x)}^{\mathsf{T}}q+{q}^{\mathsf{T}}\mathcal{F}(x),italic_F ( italic_t , italic_x , italic_J , italic_p , italic_q ) = italic_p + italic_l - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_q start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT caligraphic_G ( italic_x ) italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT caligraphic_G ( italic_x ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_q + italic_q start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT caligraphic_F ( italic_x ) , and the equivalent Lagrange-Charpit characteristic ODEs are: x˙i=fi(x)ΓijRjm𝟣Γmnqn,q˙i=Lixfijxqj+qnΓmnRlm𝟣Γikl,xqk.formulae-sequencesubscript˙𝑥𝑖subscript𝑓𝑖𝑥superscriptsubscriptΓ𝑖𝑗subscriptsuperscript𝑅1𝑗𝑚superscriptsubscriptΓ𝑚𝑛subscript𝑞𝑛subscript˙𝑞𝑖superscriptsubscript𝐿𝑖𝑥superscriptsubscript𝑓𝑖𝑗𝑥subscript𝑞𝑗subscript𝑞𝑛superscriptsubscriptΓ𝑚𝑛subscriptsuperscript𝑅1𝑙𝑚subscriptsuperscriptΓ𝑙𝑥𝑖𝑘subscript𝑞𝑘\dot{x}_{i}=f_{i}(x)-\Gamma_{i}^{j}{R}^{\mathsf{-1}}_{jm}\Gamma_{m}^{n}q_{n},% \ \dot{q}_{i}=-L_{i}^{x}-f_{ij}^{x}q_{j}+q_{n}\Gamma_{m}^{n}{R}^{\mathsf{-1}}_% {lm}\Gamma^{l,x}_{ik}q_{k}.over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) - roman_Γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j italic_m end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , over˙ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = - italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT roman_Γ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT - sansserif_1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT roman_Γ start_POSTSUPERSCRIPT italic_l , italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT . \hfill\blacksquare

    Appendix B Acknowledgment

    This work was supported by the NSF under grants ECCS-1637889, CDSE 1802867, and the AFOSR DDIP program under grant FA9550-17-1-0068. The simulations were conducted with the advanced computing resources provided by Texas A&M High Performance Research Computing.

    \printbibliography

    Author biography:

    Mohamed Naveed Gul Mohamed is pursuing his Ph.D. in Aerospace Engineering at Texas A&M University, College Station. He holds a bachelor’s degree in Instrumentation and Control Engineering from NIT Trichy, India. His research interests are on optimal control of nonlinear dynamical systems, focusing on overcoming challenges such as stochasticity, partial observation, unknown models, and stability concerns.
    Suman Chakravorty is a Professor of Aerospace Engineering at Texas A&M University. He holds a Ph.D. in Aerospace Engineering from the University of Michigan, Ann Arbor. His research interests broadly lie in Estimation and Stochastic Optimal Control Theory with application to Robotic Control and Situational Awareness Problems.
    Raman Goyal earned his Ph.D. in Aerospace Engineering from Texas A&M University, College Station and his B.Tech. degree in Mechanical Engineering from IIT Roorkee, India. Raman is interested in intelligent learning approaches for optimal control of stochastic nonlinear systems. He has also worked on modeling, design, control, and security of various cyber-physical systems.
    Ran Wang received his Ph.D. in Aerospace Engineering from Texas A&M University, College Station and his bachelor’s degree in Mechanical Engineering from Huazhong University of Science and Technology. Ran’s research interests include optimal control and reinforcement learning of soft-body robots.