License: CC BY 4.0
arXiv:2304.00346v3 [cs.RO] 04 Mar 2024

Convergent iLQR for Safe Trajectory Planning and Control
of Legged Robots

James Zhu, J. Joe Payne, and Aaron M. Johnson This material is based upon work supported by the National Science Foundation under grant #CMMI-1943900.All authors are with the Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA, [email protected], [email protected]
Abstract

In order to perform highly dynamic and agile maneuvers, legged robots typically spend time in underactuated domains (e.g. with feet off the ground) where the system has limited command of its acceleration and a constrained amount of time before transitioning to a new domain (e.g. foot touchdown). Meanwhile, these transitions can instantaneously change the system’s state, possibly causing perturbations to be mapped arbitrarily far away from the target trajectory. These properties make it difficult for local feedback controllers to effectively recover from disturbances as the system evolves through underactuated domains and hybrid impact events. To address this, we utilize the fundamental solution matrix that characterizes the evolution of perturbations through a hybrid trajectory and its 2-norm, which represents the worst-case growth of perturbations. In this paper, the worst-case perturbation analysis is used to explicitly reason about the tracking performance of a hybrid trajectory and is incorporated in an iLQR framework to optimize a trajectory while taking into account the closed-loop convergence of the trajectory under an LQR tracking controller. The generated convergent trajectories recover more effectively from perturbations, are more robust to large disturbances, and use less feedback control effort than trajectories generated with traditional methods.

Index Terms:
Legged Robots, Trajectory Optimization, Robust Control

I Introduction

Legged robotics research has increasingly focused on enabling highly dynamic and agile motions such as jum**, lea**, and landing [1, 2, 3, 4]. Implementing these capabilities reliably would improve legged robot performance in applications such as extraterrestrial or urban environment navigation where jum** up on ledges or lea** across chasms may be necessary. However, jum** and lea** are dangerous maneuvers, with failure often resulting in catastrophic outcomes for the robot.

What makes these actions challenging is that they induce trajectories that are both hybrid and underactuated, which doubly contribute to the difficulty in controlling legged robots. Broadly speaking, a system is hybrid if it undergoes discrete changes in state and/or dynamics [5, 6], and it is underactuated if there exists a direction of acceleration in state space that can not be commanded by any valid input [7, Ch. 1.2]. Even when an underactuated system is controllable, driving the system to a desired target state may require significant time and control effort, neither of which may be readily available. For instance, a robot jum** in the air can not arbitrarily choose how much time it has until its feet touchdown on the ground. This means that the controller needs to spend a lot of effort to correct tracking errors prior to touchdown, or else discontinuous, unbounded saltation effects [8] can cause arbitrarily large divergence if incoming errors are not sufficiently mitigated, e.g. with grazing impacts. Increasing control gains is one possible solution to improve stability, though that strategy comes at a large drawback of worsening robustness in the face of modelling errors and uncertainties[9, Ch. 13]. Instead, this work leverages nonlinearities in continuous and hybrid dynamics that make some trajectories easier to stabilize than others, even under equivalent feedback controllers.

Refer to caption
Figure 1: Similar quadruped gaits tracked with equivalent LQR controllers display enormous differences in final tracking performance. Results for 50 paired trials of trajectories sampled from an initial error covariance of 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT in all directions are shown. The top blue trajectory was generated with a standard (vanilla) iLQR algorithm, while the bottom green trajectory was generated with our novel χ𝜒\chiitalic_χ-iLQR, which improves the average closed-loop convergence by 92%. The horizontal distance between the displayed frames is exaggerated for visual clarity. Only the convergence of the position states is represented, and does not indicate convergence of the velocity states.

This paper presents a novel adaptation of the iLQR trajectory optimization algorithm that improves closed-loop convergence under an equivalent feedback controller (i.e. without changing the LQR controller weights), as demonstrated in Fig. 1. Our simulation results show that this convergent iLQR (χ𝜒\chiitalic_χ-iLQR) achieves three simultaneous improvements over standard iLQR: superior tracking performance from initial perturbations, reduced feedback control effort over the trajectory, and improved robustness to large initial errors. Compared to existing methods, χ𝜒\chiitalic_χ-iLQR has two additional key strengths. Firstly, it is based on an analysis that is simple to compute compared to methods such as sum-of-squares. Additionally, χ𝜒\chiitalic_χ-iLQR captures the local tracking performance of a closed-loop trajectory, which directly predicts experimental results.

II Related Works

A strategy that has been used to enable dynamic yet precarious behaviors for legged robots is leveraging highly accurate, complex models and full-body trajectory optimization to plan precise motions [3, 10]. While these methods incorporate feedback controllers to stabilize the generated trajectories, there has been little focus on how these feedback controllers should be designed to stabilize closed-loop systems under error and uncertainty.

Robust trajectory planning has been successfully implemented for smooth systems like wheeled robots, with some recent results being adapted to hybrid systems like legged robots. These works have focused on optimizing over uncertainties in system dynamics, such as unknown disturbances and modelling errors [11, 12, 13]. For example, [11] designs robust closed-loop trajectories for smooth systems by optimizing the volume reduction of an ellipsoidal disturbance set, but was not applied to hybrid systems. Risk-sensitive planning and control is an alternate method that optimizes over the variance of a cost distribution that evolves through the trajectory [14, 15]. Other approaches present trajectory optimization algorithms for legged robots over uncertain terrain [16] and compute a forward reachable set to bound closed-loop errors [17]. Many of these methods require the distribution of errors to be prespecified, which is not always clear how to tune. Additional actuation, such as reaction wheels or tails, also relieves the difficulties of underactuated systems [18, 19]. However, this comes with obvious tradeoffs of increased cost, size, and weight.

Separately, consider the problem of quantifying the stability or convergence properties of a system. A very popular method is Lyapunov analysis, where the existence of a positive definite differentiable scalar function with negative definite derivatives, called the Lyapunov function, can guarantee asymptotic stability of the system [20]. Lyapunov functions can be difficult to compute, particularly for hybrid systems, and can require methods such as sum-of-squares [21] or machine learning [22] to be tractable. A similar strategy known as control barrier functions, which restricts the system from entering some set of undesirable states, has been successfully implemented on legged robot hardware [23], but has the same drawback as Lyapunov functions.

A different strategy to analyze the stability and convergence of trajectories is contraction analysis [24], which tracks the distance between two close trajectories. If this distance monotonically decreases over the trajectory, then the system is contractive and asymptotic stability can be guaranteed [24]. Contraction analysis has been incorporated into path planning and trajectory optimization algorithms on smooth systems [25, 26], but applying contraction analysis to hybrid systems is difficult because many mechanical hybrid systems are not contractive at hybrid events [27]. [28] loosened the contraction criterion and optimized the stability of open-loop periodic orbits using monodromy matrix analysis. Here, we extend that work by generalizing to non-periodic trajectories under feedback control.

III Analysis and Planning for Hybrid Systems

This section defines a hybrid system and quantifies the performance of a closed-loop hybrid trajectory using linearized variational equations. With this analysis, we can generate a scalar measure of a trajectory’s convergence which is then incorporated into a trajectory optimization framework.

III-A Hybrid Systems

Hybrid systems are a class of dynamical systems that consist of continuous domains connected by hybrid events[5, 6]. Following the notation in [8], we describe a hybrid system as a set of discrete modes {I,J,,K}IJK\{\mathrm{I},\mathrm{J},\ldots,\mathrm{K}\}{ roman_I , roman_J , … , roman_K }, each with a domain DIsubscript𝐷I{D}_{\mathrm{I}}italic_D start_POSTSUBSCRIPT roman_I end_POSTSUBSCRIPT and a time-varying vector field FIsubscript𝐹IF_{\mathrm{I}}italic_F start_POSTSUBSCRIPT roman_I end_POSTSUBSCRIPT. G(I,J)subscript𝐺IJG_{(\mathrm{I},\mathrm{J})}italic_G start_POSTSUBSCRIPT ( roman_I , roman_J ) end_POSTSUBSCRIPT is a guard that triggers a transition between mode II\mathrm{I}roman_I and mode JJ\mathrm{J}roman_J and R(I,J)subscript𝑅IJR_{(\mathrm{I},\mathrm{J})}italic_R start_POSTSUBSCRIPT ( roman_I , roman_J ) end_POSTSUBSCRIPT is the reset map defining that transition.

An execution of a hybrid system [29] begins at an initial state x0DIsubscript𝑥0subscript𝐷Ix_{0}\in D_{\mathrm{I}}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ italic_D start_POSTSUBSCRIPT roman_I end_POSTSUBSCRIPT. With input uI(t,x)subscript𝑢I𝑡𝑥u_{\mathrm{I}}(t,x)italic_u start_POSTSUBSCRIPT roman_I end_POSTSUBSCRIPT ( italic_t , italic_x ), the system obeys the dynamics FIsubscript𝐹IF_{\mathrm{I}}italic_F start_POSTSUBSCRIPT roman_I end_POSTSUBSCRIPT on DIsubscript𝐷ID_{\mathrm{I}}italic_D start_POSTSUBSCRIPT roman_I end_POSTSUBSCRIPT. If the system reaches guard surface G(I,J)subscript𝐺IJG_{(\mathrm{I},\mathrm{J})}italic_G start_POSTSUBSCRIPT ( roman_I , roman_J ) end_POSTSUBSCRIPT, the reset map R(I,J)subscript𝑅IJR_{(\mathrm{I},\mathrm{J})}italic_R start_POSTSUBSCRIPT ( roman_I , roman_J ) end_POSTSUBSCRIPT is applied and the system continues in domain DJsubscript𝐷JD_{\mathrm{J}}italic_D start_POSTSUBSCRIPT roman_J end_POSTSUBSCRIPT under the corresponding dynamics defined by FJsubscript𝐹JF_{\mathrm{J}}italic_F start_POSTSUBSCRIPT roman_J end_POSTSUBSCRIPT. The flow ϕ(t,t0,x0,U)italic-ϕ𝑡subscript𝑡0subscript𝑥0𝑈\phi(t,t_{0},x_{0},U)italic_ϕ ( italic_t , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_U ) describes how the hybrid system evolves from some initial time t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and state x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT until some final time t𝑡titalic_t under input sequence U𝑈Uitalic_U.

III-B Linearized Variational Equations

For both continuous domains and hybrid transitions, linearized variational equations can be constructed to characterize the evolution of perturbations δx𝛿𝑥\delta xitalic_δ italic_x [30]. In each continuous domain, the linearized variational equation is discretized from timestep i𝑖iitalic_i to i+1𝑖1i+1italic_i + 1 and is δxi+1(AIBIKI)δxi𝛿subscript𝑥𝑖1subscript𝐴Isubscript𝐵Isubscript𝐾I𝛿subscript𝑥𝑖\delta x_{i+1}\approx(A_{\mathrm{I}}-B_{\mathrm{I}}K_{\mathrm{I}})\delta x_{i}italic_δ italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ≈ ( italic_A start_POSTSUBSCRIPT roman_I end_POSTSUBSCRIPT - italic_B start_POSTSUBSCRIPT roman_I end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT roman_I end_POSTSUBSCRIPT ) italic_δ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [30] with AIsubscript𝐴𝐼A_{I}italic_A start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT and BIsubscript𝐵𝐼B_{I}italic_B start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT being the derivatives of the discretized dynamics in mode II\mathrm{I}roman_I w.r.t. state xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and control inputs uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, respectively, and KIsubscript𝐾IK_{\mathrm{I}}italic_K start_POSTSUBSCRIPT roman_I end_POSTSUBSCRIPT are linear feedback gains. The feedback term drops out for open-loop systems. For hybrid events, the analogous variational equation is the saltation matrix Ξ(I,J)subscriptΞIJ\Xi_{(\mathrm{I},\mathrm{J})}roman_Ξ start_POSTSUBSCRIPT ( roman_I , roman_J ) end_POSTSUBSCRIPT, which describes the transition between modes II\mathrm{I}roman_I and JJ\mathrm{J}roman_J. The saltation matrix is the first-order approximation of the change in state perturbations from before the hybrid event at δx(t)𝛿𝑥superscript𝑡\delta x(t^{-})italic_δ italic_x ( italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) to perturbations after δx(t+)𝛿𝑥superscript𝑡\delta x(t^{+})italic_δ italic_x ( italic_t start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) [27], such that δx(t+)Ξ(I,J)δx(t)𝛿𝑥superscript𝑡subscriptΞIJ𝛿𝑥superscript𝑡\delta x(t^{+})\approx\Xi_{(\mathrm{I},\mathrm{J})}\delta x(t^{-})italic_δ italic_x ( italic_t start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) ≈ roman_Ξ start_POSTSUBSCRIPT ( roman_I , roman_J ) end_POSTSUBSCRIPT italic_δ italic_x ( italic_t start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ). This linear approximation assumes that nearby trajectories undergo the same mode transition. Computing the saltation matrix relies on the derivatives of the reset and guard of the transition along with the dynamics in each mode, and is detailed in [8].

III-C Convergence Measure

Consider a trajectory that begins at state x0=x(t0)subscript𝑥0𝑥subscript𝑡0x_{0}=x(t_{0})italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_x ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) at time t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and is executed until time tfsubscript𝑡𝑓t_{f}italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT where it arrives at state xf=ϕ(tf,t0,x0,U)subscript𝑥𝑓italic-ϕsubscript𝑡𝑓subscript𝑡0subscript𝑥0𝑈x_{f}=\phi(t_{f},t_{0},x_{0},U)italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = italic_ϕ ( italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_U ). The control objective is to bring any nearby initial state x¯0=x0+δx0subscript¯𝑥0subscript𝑥0𝛿subscript𝑥0\bar{x}_{0}=x_{0}+\delta x_{0}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_δ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT towards the nominal trajectory so that at time tfsubscript𝑡𝑓t_{f}italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, x¯f=ϕ(tf,t0,x¯0,U¯)=xf+δxfsubscript¯𝑥𝑓italic-ϕsubscript𝑡𝑓subscript𝑡0subscript¯𝑥0¯𝑈subscript𝑥𝑓𝛿subscript𝑥𝑓\bar{x}_{f}=\phi(t_{f},t_{0},\bar{x}_{0},\bar{U})=x_{f}+\delta x_{f}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = italic_ϕ ( italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over¯ start_ARG italic_U end_ARG ) = italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + italic_δ italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT is closer to xfsubscript𝑥𝑓x_{f}italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT. To characterize the closeness of x¯fsubscript¯𝑥𝑓\bar{x}_{f}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and xfsubscript𝑥𝑓x_{f}italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, we utilize the fundamental solution matrix, ΦΦ\Phiroman_Φ. Following [31], the fundamental solution matrix is the linearized approximation δxfΦδx0𝛿subscript𝑥𝑓Φ𝛿subscript𝑥0\delta x_{f}\approx\Phi\delta x_{0}italic_δ italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ≈ roman_Φ italic_δ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and represents the transformation of error from the initial state to final state.

The fundamental solution matrix can be computed by sequentially composing the linearized variational equations in each continuous domain (A~:=ABKassign~𝐴𝐴𝐵𝐾\tilde{A}:=A-BKover~ start_ARG italic_A end_ARG := italic_A - italic_B italic_K) and the saltation matrices (ΞΞ\Xiroman_Ξ) at each hybrid event [28]. For a hybrid trajectory with N𝑁Nitalic_N domains, the fundamental solution matrix is:

ΦΦ\displaystyle\Phiroman_Φ =A~NΞ(N1,N)Ξ(2,3)A~2Ξ(1,2)A~1absentsubscript~𝐴𝑁subscriptΞ𝑁1𝑁subscriptΞ23subscript~𝐴2subscriptΞ12subscript~𝐴1\displaystyle=\tilde{A}_{N}\Xi_{(N-1,N)}\dots\Xi_{(2,3)}\tilde{A}_{2}\Xi_{(1,2% )}\tilde{A}_{1}= over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT roman_Ξ start_POSTSUBSCRIPT ( italic_N - 1 , italic_N ) end_POSTSUBSCRIPT … roman_Ξ start_POSTSUBSCRIPT ( 2 , 3 ) end_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Ξ start_POSTSUBSCRIPT ( 1 , 2 ) end_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (1)

Since the fundamental solution matrix captures the change in errors across a trajectory, the singular values of ΦΦ\Phiroman_Φ characterize error change along principle axes of state space. The largest singular value, which is equivalent to the induced 2-norm of ΦΦ\Phiroman_Φ, describes the evolution of the most divergent direction of initial error δx0𝛿subscript𝑥0\delta x_{0}italic_δ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. We define the convergence measure, χ𝜒\chiitalic_χ to be exactly this worst-case value:

χ=Φ2𝜒subscriptnormΦ2\displaystyle\chi=||\Phi||_{2}italic_χ = | | roman_Φ | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (2)

χ𝜒\chiitalic_χ is a continuous measure of local convergence, where smaller values of χ𝜒\chiitalic_χ indicate stronger reduction of worst-case final errors. A value of χ<1𝜒1\chi<1italic_χ < 1 indicates errors in all directions will shrink.

III-D iLQR for Hybrid Systems

The iterative linear quadratic regulator (iLQR) is a trajectory optimization method that also computes LQR feedback gains over the generated trajectory [32]. iLQR is convenient because compared to other trajectory optimization methods like direct collocation [33], it is less computationally intensive and guarantees a feasible trajectory. We draw from recent work that adapts the iLQR algorithm for use on hybrid dynamical systems [34, 35].

In brief, iLQR solves the optimal control problem over N𝑁Nitalic_N discretized timesteps:

minUsubscript𝑈\displaystyle\min_{U}\quadroman_min start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT N(xN)+i=0N1i(xi,ui)subscript𝑁subscript𝑥𝑁superscriptsubscript𝑖0𝑁1subscript𝑖subscript𝑥𝑖subscript𝑢𝑖\displaystyle\ell_{N}(x_{N})+\sum_{i=0}^{N-1}\ell_{i}(x_{i},u_{i})roman_ℓ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (3)
where x0=x(0)subscript𝑥0𝑥0\displaystyle x_{0}=x(0)italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_x ( 0 ) (4)
xi+1=ϕ(ti+1,ti,xi,ui)subscript𝑥𝑖1italic-ϕsubscript𝑡𝑖1subscript𝑡𝑖subscript𝑥𝑖subscript𝑢𝑖\displaystyle x_{i+1}=\phi(t_{i+1},t_{i},x_{i},u_{i})italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT = italic_ϕ ( italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ifor-all𝑖\displaystyle\forall i∀ italic_i (5)

where i(xi,ui)subscript𝑖subscript𝑥𝑖subscript𝑢𝑖\ell_{i}(x_{i},u_{i})roman_ℓ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and N(xN)subscript𝑁subscript𝑥𝑁\ell_{N}(x_{N})roman_ℓ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) represent the nonlinear stage cost and terminal cost, respectively, X:={x0,x1,,xN}assign𝑋subscript𝑥0subscript𝑥1subscript𝑥𝑁X:=\{x_{0},x_{1},...,x_{N}\}italic_X := { italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } is a sequence of states with xinsubscript𝑥𝑖superscript𝑛x_{i}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT the system state at timestep i𝑖iitalic_i and U:={u0,u1,,uN1}assign𝑈subscript𝑢0subscript𝑢1subscript𝑢𝑁1U:=\{u_{0},u_{1},...,u_{N-1}\}italic_U := { italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_N - 1 end_POSTSUBSCRIPT } is a sequence of control inputs with uimsubscript𝑢𝑖superscript𝑚u_{i}\in\mathbb{R}^{m}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT the control input at timestep i𝑖iitalic_i. We also record the sequence of domains M:={D0,D1,,DN}assign𝑀subscript𝐷0subscript𝐷1subscript𝐷𝑁M:=\{D_{0},D_{1},...,D_{N}\}italic_M := { italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_D start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } with Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT the domain at timestep i𝑖iitalic_i such that xiDisubscript𝑥𝑖subscript𝐷𝑖x_{i}\in D_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. ϕitalic-ϕ\phiitalic_ϕ is the aforementioned flow of the trajectory.

iLQR computes gradient and Hessian information of the cost, which results in a quadratic approximation of the cost function. As such, the state and terminal costs can equivalently be simplified as quadratic functions such that the cost function is simplified to:

J=xNTQNxN+i=0N1xiTQixi+uiTRiui𝐽superscriptsubscript𝑥𝑁𝑇subscript𝑄𝑁subscript𝑥𝑁superscriptsubscript𝑖0𝑁1superscriptsubscript𝑥𝑖𝑇subscript𝑄𝑖subscript𝑥𝑖superscriptsubscript𝑢𝑖𝑇subscript𝑅𝑖subscript𝑢𝑖\displaystyle J=x_{N}^{T}Q_{N}x_{N}+\sum_{i=0}^{N-1}x_{i}^{T}Q_{i}x_{i}+u_{i}^% {T}R_{i}u_{i}italic_J = italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (6)

with Qi,QNn×nsubscript𝑄𝑖subscript𝑄𝑁superscript𝑛𝑛Q_{i},Q_{N}\in\mathbb{R}^{n\times n}italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT and Rim×msubscript𝑅𝑖superscript𝑚𝑚R_{i}\in\mathbb{R}^{m\times m}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT all positive definite.

iLQR solves the optimal control problem by alternating between forward passes that simulate the system under a given control input sequence, and backward passes that solve for a new locally optimal control sequence. In the backward pass, the value function, which is the optimal cost to go at any timestep, is propagated through the trajectory in reverse, and gives locally optimal feedforward inputs and feedback gains at each timestep. Computing the value function relies on gradient and Hessian computations of the cost function and Jacobians of the dynamics, which equates to computing the linearized variational equations discussed in Sec. III-B. For much greater detail of iLQR for hybrid systems, see [34, 35].

IV Convergent iLQR

Here we present a novel trajectory optimization algorithm called convergent iLQR or χ𝜒\chiitalic_χ-iLQR, summarized in Algorithm 1. In this method, the convergence measure is added to the cost function from (6) such that the algorithm minimizes:

Jχ=subscript𝐽𝜒absent\displaystyle J_{\chi}=\quaditalic_J start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT = Qχχ+xNTQNxN+i=0N1xiTQixi+uiTRiuisubscript𝑄𝜒𝜒superscriptsubscript𝑥𝑁𝑇subscript𝑄𝑁subscript𝑥𝑁superscriptsubscript𝑖0𝑁1superscriptsubscript𝑥𝑖𝑇subscript𝑄𝑖subscript𝑥𝑖superscriptsubscript𝑢𝑖𝑇subscript𝑅𝑖subscript𝑢𝑖\displaystyle Q_{\chi}\chi+x_{N}^{T}Q_{N}x_{N}+\sum_{i=0}^{N-1}x_{i}^{T}Q_{i}x% _{i}+u_{i}^{T}R_{i}u_{i}italic_Q start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT italic_χ + italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (7)

where Qχsubscript𝑄𝜒Q_{\chi}italic_Q start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT is a scalar weighting parameter. Since χ𝜒\chiitalic_χ is solely a function of states and inputs, iLQR uses gradient and Hessian information to make a quadratic approximation compatible with the other cost terms.

Typically in iLQR, the cost function J𝐽Jitalic_J is evaluated after each forward pass, since it is only dependent on the states and inputs of the most recent trajectory. However, in this case the convergence measure portion of the cost function is dependent on the feedback gains generated by the algorithm. This means that the gradient and Hessian terms of the cost function rely on the feedback gains that are being updated at every timestep in the backward pass. Due to this, the cost function derivatives are highly coupled with the gains and become convoluted to compute.

To resolve this, we propose executing two separate backward passes that each compute a different set of gains. We do this to preserve the convergence properties of iLQR, though other choices might be possible such as borrowing the previous set of gains under the assumption that the gains do not change significantly over iterations. First, the tracking backward pass computes the feedback gains that will be used as the LQR tracking controller gains and to compute the convergence measure. It is equivalent to the backward pass in standard iLQR using the cost function J𝐽Jitalic_J (6), which solves the Riccati equation for the most recent trajectory. With the gains generated in the tracking backward pass Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the convergent cost function Jχsubscript𝐽𝜒J_{\chi}italic_J start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT (7) can be computed. The search backward pass takes Jχsubscript𝐽𝜒J_{\chi}italic_J start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT from the tracking backward pass and computes the gradients of the convergent cost function with controller gains Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The feedforward inputs kssubscript𝑘𝑠k_{s}italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and the feedback gains Kssubscript𝐾𝑠K_{s}italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT from this pass are used to search for an improved trajectory in the forward pass.

Since Jχsubscript𝐽𝜒J_{\chi}italic_J start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT is returned by the tracking backward pass, a line search is performed after this function call to guarantee the reduction of the cost function Jχsubscript𝐽𝜒J_{\chi}italic_J start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT. If the line search condition is not satisfied, the forward pass and tracking backward pass are looped until the line search condition is passed.

Within the search backward pass, iLQR requires computation of the gradient and Hessian of χ𝜒\chiitalic_χ. The derivatives of χ𝜒\chiitalic_χ can be computed by leveraging the singular value decomposition of Φ=USVTΦ𝑈𝑆superscript𝑉𝑇\Phi=USV^{T}roman_Φ = italic_U italic_S italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT where S𝑆Sitalic_S is a diagonal matrix of singular values and the columns of U𝑈Uitalic_U and V𝑉Vitalic_V are the left and right singular vectors, respectively. χ𝜒\chiitalic_χ is the largest singular value of ΦΦ\Phiroman_Φ and let uχsubscript𝑢𝜒u_{\chi}italic_u start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT and vχsubscript𝑣𝜒v_{\chi}italic_v start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT be its corresponding left and right singular vectors. Following [36], the derivative of χ𝜒\chiitalic_χ with respect to the state at timestep i𝑖iitalic_i is:

χxi𝜒subscript𝑥𝑖\displaystyle\frac{\partial\chi}{\partial x_{i}}divide start_ARG ∂ italic_χ end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG =uχTΦxivχabsentsuperscriptsubscript𝑢𝜒𝑇Φsubscript𝑥𝑖subscript𝑣𝜒\displaystyle=u_{\chi}^{T}\frac{\partial\Phi}{\partial x_{i}}v_{\chi}= italic_u start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG ∂ roman_Φ end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG italic_v start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT (8)

ΦxiΦsubscript𝑥𝑖\frac{\partial\Phi}{\partial x_{i}}divide start_ARG ∂ roman_Φ end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG in turn can be computed by using the product rule along with leveraging the fact that only A~isubscript~𝐴𝑖\tilde{A}_{i}over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Ξ(i,i+1)subscriptΞ𝑖𝑖1\Xi_{(i,i+1)}roman_Ξ start_POSTSUBSCRIPT ( italic_i , italic_i + 1 ) end_POSTSUBSCRIPT are functions of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and all other A~~𝐴\tilde{A}over~ start_ARG italic_A end_ARG and ΞΞ\Xiroman_Ξ terms have zero derivatives with respect to xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. For notational brevity, let:

Pisubscript𝑃𝑖\displaystyle P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =A~NΞ(i+1,i+2)A~i+1absentsubscript~𝐴𝑁subscriptΞ𝑖1𝑖2subscript~𝐴𝑖1\displaystyle=\tilde{A}_{N}\cdots\Xi_{(i+1,i+2)}\tilde{A}_{i+1}= over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ⋯ roman_Ξ start_POSTSUBSCRIPT ( italic_i + 1 , italic_i + 2 ) end_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT (9)
Oisubscript𝑂𝑖\displaystyle O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =Ξ(i1,i)A~i1Ξ(1,2)A~1absentsubscriptΞ𝑖1𝑖subscript~𝐴𝑖1subscriptΞ12subscript~𝐴1\displaystyle=\Xi_{(i-1,i)}\tilde{A}_{i-1}\cdots\Xi_{(1,2)}\tilde{A}_{1}= roman_Ξ start_POSTSUBSCRIPT ( italic_i - 1 , italic_i ) end_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ⋯ roman_Ξ start_POSTSUBSCRIPT ( 1 , 2 ) end_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (10)

such that Φ=PiΞ(i,i+1)A~iOiΦsubscript𝑃𝑖subscriptΞ𝑖𝑖1subscript~𝐴𝑖subscript𝑂𝑖\Phi=P_{i}\Xi_{(i,i+1)}\tilde{A}_{i}O_{i}roman_Φ = italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Ξ start_POSTSUBSCRIPT ( italic_i , italic_i + 1 ) end_POSTSUBSCRIPT over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Pixi=Oixi=0subscript𝑃𝑖subscript𝑥𝑖subscript𝑂𝑖subscript𝑥𝑖0\frac{\partial P_{i}}{\partial x_{i}}=\frac{\partial O_{i}}{\partial x_{i}}=0divide start_ARG ∂ italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = divide start_ARG ∂ italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = 0. Thus:

ΦxiΦsubscript𝑥𝑖\displaystyle\frac{\partial\Phi}{\partial x_{i}}divide start_ARG ∂ roman_Φ end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG =PiΞ(i,i+1)xiA~iOi+PiΞ(i,i+1)A~ixiOiabsentsubscript𝑃𝑖subscriptΞ𝑖𝑖1subscript𝑥𝑖subscript~𝐴𝑖subscript𝑂𝑖subscript𝑃𝑖subscriptΞ𝑖𝑖1subscript~𝐴𝑖subscript𝑥𝑖subscript𝑂𝑖\displaystyle=P_{i}\frac{\partial\Xi_{(i,i+1)}}{\partial x_{i}}\tilde{A}_{i}O_% {i}+P_{i}\Xi_{(i,i+1)}\frac{\partial\tilde{A}_{i}}{\partial x_{i}}O_{i}= italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG ∂ roman_Ξ start_POSTSUBSCRIPT ( italic_i , italic_i + 1 ) end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_Ξ start_POSTSUBSCRIPT ( italic_i , italic_i + 1 ) end_POSTSUBSCRIPT divide start_ARG ∂ over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (11)

Derivatives with respect to the input uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT follow equivalently. To improve computational efficiency, Oisubscript𝑂𝑖O_{i}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at each timestep can be computed recursively during the forward pass and each Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be computed recursively in the tracking backward pass. Since the rollout does not yet have feedback gain information, the initial O𝑂Oitalic_O values must be computed separately.

Algorithm 1 Convergent iLQR Algorithm
Initialize U𝑈Uitalic_U, Qχsubscript𝑄𝜒Q_{\chi}italic_Q start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT, QNsubscript𝑄𝑁Q_{N}italic_Q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, Qisubscript𝑄𝑖Q_{i}italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, niterationssubscript𝑛iterationsn_{\text{iterations}}italic_n start_POSTSUBSCRIPT iterations end_POSTSUBSCRIPT
X𝑋Xitalic_X, U𝑈Uitalic_U, M𝑀Mitalic_M, J𝐽absentJ\leftarrowitalic_J ← Rollout(U𝑈Uitalic_U)
Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, Jχsubscript𝐽𝜒J_{\chi}italic_J start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT, P𝑃absentP\leftarrowitalic_P ← TrackingBP(X𝑋Xitalic_X, U𝑈Uitalic_U, M𝑀Mitalic_M, J𝐽Jitalic_J)
O𝑂absentO\leftarrowitalic_O ← ComputeO(X𝑋Xitalic_X, U𝑈Uitalic_U, M𝑀Mitalic_M, Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT)
for i1 to niterations𝑖1 to subscript𝑛iterationsi\leftarrow 1\text{ to }n_{\text{iterations}}italic_i ← 1 to italic_n start_POSTSUBSCRIPT iterations end_POSTSUBSCRIPT do
   kssubscript𝑘𝑠k_{s}italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, Kssubscript𝐾𝑠absentK_{s}\leftarrowitalic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ← SearchBP(X𝑋Xitalic_X, U𝑈Uitalic_U, M𝑀Mitalic_M, Jχsubscript𝐽𝜒J_{\chi}italic_J start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT, O𝑂Oitalic_O)
   repeat
     X𝑋Xitalic_X, U𝑈Uitalic_U, M𝑀Mitalic_M, J𝐽Jitalic_J, O𝑂absentO\leftarrowitalic_O ← ForwardPass(X𝑋Xitalic_X, U𝑈Uitalic_U, M𝑀Mitalic_M, kssubscript𝑘𝑠k_{s}italic_k start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, Kssubscript𝐾𝑠K_{s}italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT)
     Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, Jχsubscript𝐽𝜒J_{\chi}italic_J start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT, P𝑃absentP\leftarrowitalic_P ← TrackingBP(X𝑋Xitalic_X, U𝑈Uitalic_U, M𝑀Mitalic_M, J𝐽Jitalic_J)
   until LineSearchIsSatisfied(Jχsubscript𝐽𝜒J_{\chi}italic_J start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT) return X𝑋Xitalic_X, U𝑈Uitalic_U, M𝑀Mitalic_M, Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

Because the scalar χ𝜒\chiitalic_χ is derived from the norm of the matrix ΦΦ\Phiroman_Φ, the gradient of χ𝜒\chiitalic_χ relies on computing a 3-dimensional tensor of A~isubscript~𝐴𝑖\tilde{A}_{i}over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ΞΞ\Xiroman_Ξ derivatives, and the Hessian of χ𝜒\chiitalic_χ is computed from a 4-dimensional tensor of matrix second derivatives. While recent work has enabled faster computation of second derivatives of dynamics [37] which can aid in the computation of the 3-D tensor derivatives, computing 4-D tensor derivatives is generally untenable. Instead, numerical methods like finite differences for gradients and BFGS [38] for Hessians can perform at reasonable speed. In order to approach real-time computation, it is likely that the full Hessian of χ𝜒\chiitalic_χ is not necessary to find an appropriate search direction and that a partial computation or even leaving out the Hessian completely is sufficient to compute optimal trajectories. Future work will address this gap. Nonetheless, the algorithm in its current form can still be useful for offline planning for trajectories that are expected to have a high degree of risk, such as lea** across ledges or traversing narrow beams. In real-world applications, it can be acceptable for a robot to pause and plan a safe trajectory before executing these dangerous maneuvers.

TABLE I: Rocket Hopper Convergence Measure and Simulation Results
Trial LQR Parameters Convergence Measure Mean Simulated Error Ratio Mean Simulated Feedback Effort
Qχsubscript𝑄𝜒Q_{\chi}italic_Q start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT QNsubscript𝑄𝑁Q_{N}italic_Q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT Riairsubscript𝑅subscript𝑖airR_{i_{\text{air}}}italic_R start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT air end_POSTSUBSCRIPT end_POSTSUBSCRIPT Ristancesubscript𝑅subscript𝑖stanceR_{i_{\text{stance}}}italic_R start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT stance end_POSTSUBSCRIPT end_POSTSUBSCRIPT Vanilla χ𝜒\chiitalic_χ-iLQR %Difference Vanilla χ𝜒\chiitalic_χ-iLQR %Difference Vanilla χ𝜒\chiitalic_χ-iLQR %Difference
 1 50 500I𝐼Iitalic_I 0.01I0.01𝐼0.01I0.01 italic_I 0.1I0.1𝐼0.1I0.1 italic_I 1.01 0.71 -29.70% 0.42 0.33 -21.72% 1.741051.74superscript1051.74\cdot 10^{-5}1.74 ⋅ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 1.61051.6superscript1051.6\cdot 10^{-5}1.6 ⋅ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT -7.16%
2 50 800I𝐼Iitalic_I 0.005I0.005𝐼0.005I0.005 italic_I 0.01I0.01𝐼0.01I0.01 italic_I 0.78 0.51 -34.50% 0.32 0.24 -25.74% 4.661054.66superscript1054.66\cdot 10^{-5}4.66 ⋅ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 3.251053.25superscript1053.25\cdot 10^{-5}3.25 ⋅ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT -30.31%
3 50 250I𝐼Iitalic_I 0.02I0.02𝐼0.02I0.02 italic_I 0.05I0.05𝐼0.05I0.05 italic_I 1.14 0.94 -17.70% 0.49 0.45 -8.00% 7.121067.12superscript1067.12\cdot 10^{-6}7.12 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT 7.051067.05superscript1067.05\cdot 10^{-6}7.05 ⋅ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT -1.01%
4 75 500I𝐼Iitalic_I 0.01I0.01𝐼0.01I0.01 italic_I 0.01I0.01𝐼0.01I0.01 italic_I 1.01 0.68 -33.24% 0.41 0.32 -21.73% 1.891051.89superscript1051.89\cdot 10^{-5}1.89 ⋅ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 1.621051.62superscript1051.62\cdot 10^{-5}1.62 ⋅ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT -14.20%

V Examples and Results

In this section, we demonstrate the convergence improvements of our method on a spring hopper system and a planar quadruped robot model. Simulation results show that the improved convergence measure correlates with an improvement in average tracking performance, robustness to large disturbances, and feedback control effort. Both examples were implemented in MATLAB, with forward simulations using the ode113 function. Cost function gradients were computed using (11), derivatives of A~isubscript~𝐴𝑖\tilde{A}_{i}over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Ξ(i,i+1)subscriptΞ𝑖𝑖1\Xi_{(i,i+1)}roman_Ξ start_POSTSUBSCRIPT ( italic_i , italic_i + 1 ) end_POSTSUBSCRIPT were computed with finite differences, and Hessians were computed with BFGS.

V-A Rocket Hopper

V-A1 Rocket Hopper Model

This system is made up of a point mass body with a single massless spring leg. The state of the hopper is characterized by the positions xBsubscript𝑥𝐵x_{B}italic_x start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT, yBsubscript𝑦𝐵y_{B}italic_y start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT of the body, the angle θ𝜃\thetaitalic_θ of the leg and their derivatives x˙Bsubscript˙𝑥𝐵\dot{x}_{B}over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT, y˙Bsubscript˙𝑦𝐵\dot{y}_{B}over˙ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT, θ˙˙𝜃\dot{\theta}over˙ start_ARG italic_θ end_ARG such that the full state is a 6×1616\times 16 × 1 vector. The system has two domains: an aerial phase D1subscript𝐷1D_{1}italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and a stance phase D2subscript𝐷2D_{2}italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Taking a constant ground height at zero gives a touchdown guard function g(1,2)subscript𝑔12g_{(1,2)}italic_g start_POSTSUBSCRIPT ( 1 , 2 ) end_POSTSUBSCRIPT that is the height of the foot and a liftoff guard function g(2,1)subscript𝑔21g_{(2,1)}italic_g start_POSTSUBSCRIPT ( 2 , 1 ) end_POSTSUBSCRIPT that is the ground reaction force applied by the spring leg. Both reset maps R(1,2)subscript𝑅12R_{(1,2)}italic_R start_POSTSUBSCRIPT ( 1 , 2 ) end_POSTSUBSCRIPT and R(2,1)subscript𝑅21R_{(2,1)}italic_R start_POSTSUBSCRIPT ( 2 , 1 ) end_POSTSUBSCRIPT are identity since position and velocity are continuous.

The system has two inputs: a hip actuation and an actuation in the direction of the leg. In the air, this allows the hopper to rotate the leg around the body and exert a propulsion in the direction of the leg, somewhat akin to a rocket, though this force can approximate forces from other legs or actuators. A small rotor inertia in the air ensures the dynamics are well-conditioned when controlling the massless leg. In stance, the hip torque and rocket force exert ground reaction forces on the body. The body mass of the hopper was chosen as 1 kg, spring constant as 250 N/m, and resting leg length as 0.75 m.

V-A2 Rocket Hopper Results

The objective for this system is to begin in the air at rest with a height of 2 m and end in the air at rest with the same height displaced 0.2 m horizontally. The system is given 1.5 s for this trajectory.

We generated four trials of paired trajectories with varied weighting parameters, shown in Table I, and compared the performance of the standard (vanilla) iLQR method (where Qχ=0subscript𝑄𝜒0Q_{\chi}=0italic_Q start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT = 0) to χ𝜒\chiitalic_χ-iLQR. There is no reference trajectory to track, so Qisubscript𝑄𝑖Q_{i}italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is zero for all trials.

For 3 of these trials, vanilla iLQR generated a trajectory with χ>1𝜒1\chi>1italic_χ > 1, meaning the worst-case error direction was expansive, see Table I. χ𝜒\chiitalic_χ-iLQR decreases every convergence measure to below 1 so that all error directions are reduced over the trajectory. On average, χ𝜒\chiitalic_χ-iLQR decreased χ𝜒\chiitalic_χ by 28.79% compared to the vanilla method.

To validate these trajectories, each closed-loop trajectory was simulated 100 times with equivalent small random initial perturbations in both positions and velocities with covariance matrix cov(X0)=104Icovsubscript𝑋0superscript104𝐼\text{cov}(X_{0})=10^{-4}Icov ( italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT italic_I. A small covariance was chosen so that the linearizations assumed in the convergence measure and LQR control are valid. For each simulation run, the initial error δx0𝛿subscript𝑥0\delta x_{0}italic_δ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and the final error δxf𝛿subscript𝑥𝑓\delta x_{f}italic_δ italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT were recorded, along with the sequence of control inputs V:={v0,v1,,vN1}assign𝑉subscript𝑣0subscript𝑣1subscript𝑣𝑁1V:=\{v_{0},v_{1},...,v_{N-1}\}italic_V := { italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_N - 1 end_POSTSUBSCRIPT }. Note that these inputs are distinct from the nominal feedforward inputs to the system U𝑈Uitalic_U because there is additional feedback effort exerted by the actuators.

Two values were recorded during each simulation run to characterize the convergence properties of the trajectories. The first is the error ratio, E=δxf2δx02𝐸subscriptnorm𝛿subscript𝑥𝑓2subscriptnorm𝛿subscript𝑥02E=\frac{||\delta x_{f}||_{2}}{||\delta x_{0}||_{2}}italic_E = divide start_ARG | | italic_δ italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG | | italic_δ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG defined as the ratio of the final error 2-norm to the initial error 2-norm. A lower error ratio means better tracking performance, and E<1𝐸1E<1italic_E < 1 indicates a net reduction in error on average. The second value is the feedback effort, F=i=0N1(viui)2𝐹superscriptsubscript𝑖0𝑁1superscriptsubscript𝑣𝑖subscript𝑢𝑖2F=\sum_{i=0}^{N-1}{(v_{i}-u_{i})^{2}}italic_F = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT which is the sum of squares of the difference between V𝑉Vitalic_V and U𝑈Uitalic_U.

Table I shows that the simulation results support our assertion that an improved convergence measure correlates with an improved mean tracking performance and feedback effort. The mean error ratio and feedback effort over the 100 simulations were both lower for trajectories generated with χ𝜒\chiitalic_χ-iLQR. The average improvement over the four trials was 19.30% for mean error ratio and 13.17% for feedback effort.

None of the simulated runs had an error ratio greater than one, which is sensible since the worst-case direction occurs with probability zero. However, even if none of the sampled initial errors aligned exactly with the worst-case direction predicted by the fundamental solution matrix, nearby initial error directions still see improvement in convergence, which explains the improvement in mean simulated error ratio.

V-B Planar Quadruped

Here we demonstrate the improvements of χ𝜒\chiitalic_χ-iLQR on a more complex robot model akin a standard quadruped robot. The model is simplified as a planar quadruped, meaning that all movement occurs in the sagittal plane and the left-right pairs of legs are constrained to move identically.

V-B1 Planar Quadruped Model

In the sagittal plane, we can model the robot with 7 positional states. xBsubscript𝑥𝐵x_{B}italic_x start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT, yBsubscript𝑦𝐵y_{B}italic_y start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT, θBsubscript𝜃𝐵\theta_{B}italic_θ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT are the position and orientation of the body. The front and back sets of legs each have two states for the hip angle αf,αbsubscript𝛼𝑓subscript𝛼𝑏\alpha_{f},\alpha_{b}italic_α start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT and knee angle βf,βbsubscript𝛽𝑓subscript𝛽𝑏\beta_{f},\beta_{b}italic_β start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT. Thus the full state is dimension 14.

This system has four domains: the aerial domain D1subscript𝐷1D_{1}italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, front stance domain D2subscript𝐷2D_{2}italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, back stance domain D3subscript𝐷3D_{3}italic_D start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, and full stance domain D4subscript𝐷4D_{4}italic_D start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT. The impact guard function is the height of the foot and the guard function for liftoff is the vertical ground reaction force. The dynamics of the robot body in the aerial phase follow ballistic motion, while the legs are simplified to be massless while including the aforementioned rotor inertia. The impact reset map for each foot consists of a discrete update to the hip and knee velocities, while the body states are unchanged due to the massless legs. The liftoff reset map is identity. The input vector for this system is 4-dimensional to actuate the hip and knee joints.

In this model, parallel torsion springs are added to the knee joints. Parallel joint springs have been utilized to mimic tendons found in animals [39] that increase the energy efficiency of legged locomotion [40, 41]. Due to the resonance of the natural spring dynamics, controlling these systems requires special care [42, 43]. For example, [44] solved for optimal gait timings to leverage resonant spring frequencies. These spring models of legged robots are good candidates for χ𝜒\chiitalic_χ-iLQR because the dynamics of the stance phase depend strongly on the leg configuration at touchdown. Thus, a small error in leg states at touchdown can have a large effect on tracking performance.

The inertial and dimensional properties were chosen to match the Ghost Robotics Spirit 40 quadruped. The added torsional knee spring has a spring constant 75 Nmrad175 Nmsuperscriptrad175\text{ N}\cdot\text{m}\cdot\text{rad}^{-1}75 N ⋅ m ⋅ rad start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and rest angle 1.2 rad.

V-B2 Planar Quadruped Results

The trajectory optimization task for the planar quadruped is to generate a gait with a forward velocity of 0.25 m/s. The robot begins in the air with a body height of 0.3 m. The hip joints begin at an angle of 0.6 rad and the knee joints begin at 1.2 rad. The terminal target state is translated 0.0875 m in the x-direction from the initial state. The trajectory is given 0.35 s to execute. We choose to set a constant input weight of Ri=5104Isubscript𝑅𝑖5superscript104𝐼R_{i}=5\cdot 10^{-4}Iitalic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 5 ⋅ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT italic_I. The terminal weight is QN=500Isubscript𝑄𝑁500𝐼Q_{N}=500Iitalic_Q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = 500 italic_I and the convergence weight for χ𝜒\chiitalic_χ-iLQR is Qχ=1subscript𝑄𝜒1Q_{\chi}=1italic_Q start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT = 1.

We set up a similar experiment to the prior example, with the addition of simulating over a range of covariance magnitudes. This is done to evaluate the basin of attraction of each trajectory over larger initial errors that introduce greater nonlinear effects. The two trajectories were evaluated with 6 sets of 100 paired simulation runs with random initial error covariance magnitudes of 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, 51045superscript1045\cdot 10^{-4}5 ⋅ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, 51035superscript1035\cdot 10^{-3}5 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT, and 51025superscript1025\cdot 10^{-2}5 ⋅ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT in each direction. The lowest covariance magnitude of 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT approximates local linear behavior well, while 51025superscript1025\cdot 10^{-2}5 ⋅ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT is the maximum magnitude before some trials begin with the robot’s feet below the ground.

Table II shows the vanilla cost (6), convergence measure, mean simulated error ratio, and mean simulated feedback effort of the two trajectories at the covariance magnitude 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. As expected, the vanilla cost of the convergent trajectory increases since its optimizing for a different cost function, while the convergence measure and simulation values improve. We argue that in dynamic legged locomotion, a costlier nominal trajectory can often be worth an improvement in the trajectory’s robustness.

TABLE II: Mean convergence results for the quadruped model for vanilla and χ𝜒\chiitalic_χ-iLQR with covariance magnitude 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
Vanilla χ𝜒\chiitalic_χ-iLQR %Difference
 Vanilla Cost 65.58 74.39 +13.43%
Convergence Measure 60.52 41.35 -31.68%
Mean Simulated Error Ratio 6.66 4.78 -28.23%
Mean Simulated Feedback Effort 0.016 0.013 -16.56%

The mean simulated error ratio for the convergent trajectory at this small covariance magnitude was 28.23% less and the mean simulated feedback effort was 16.56% less. Fig. 2 displays a histogram of the error ratio for each of the trials, with the convergent trajectory having improved performance.

As the magnitude of initial errors grows, the performance of the LQR tracking controller becomes worse due to the increase in nonlinear effects. Fig. 3 shows the simulation results for each trajectory over a range of initial error covariance magnitudes. Each pair of lines indicates the success rate of the respective closed-loop trajectories at maintaining error ratios of less than 50, 10, and 5 respectively. An error ratio of greater than 50 is representative of a catastrophic failure, which the vanilla trajectory encounters at a covariance magnitude of 51045superscript1045\cdot 10^{-4}5 ⋅ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, while the convergent trajectory first experiences a failure at covariance magnitude 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT. This difference in performance suggests the convergent trajectory is more robust to larger initial errors and nonlinearities.

Refer to caption
Figure 2: Histogram of error ratio for 100 paired simulated trials of the quadruped model with small initial perturbations. Error ratio is the 2-norm of final errors divided by the 2-norm of initial errors.

Even with the χ𝜒\chiitalic_χ-iLQR convergence improvements, the controller does not reduce errors in all directions. The simulation results show there was usually some error growth, which is reasonable since the body dynamics are fully unactuated in the aerial phase and the system undergoes multiple hybrid events. A combination of higher feedback gains and a global footstep planner could be able to grant this system full convergence. Even so, this work can be valuable to ensure that the system does not diverge too far from its target trajectory between iterations of a global planner.

Refer to caption
Figure 3: Plots showing success of LQR controllers at tracking vanilla (V𝑉Vitalic_V) and convergent (χ𝜒\chiitalic_χ) trajectories over various initial perturbation covariances. V50subscript𝑉50V_{50}italic_V start_POSTSUBSCRIPT 50 end_POSTSUBSCRIPT and χ50subscript𝜒50\chi_{50}italic_χ start_POSTSUBSCRIPT 50 end_POSTSUBSCRIPT indicate proportion of trials where error ratio was below 50, V10subscript𝑉10V_{10}italic_V start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT and χ10subscript𝜒10\chi_{10}italic_χ start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT below 10, and V5subscript𝑉5V_{5}italic_V start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT and χ5subscript𝜒5\chi_{5}italic_χ start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT below 5.

VI Conclusion

In this work, we present a novel trajectory optimization method, χ𝜒\chiitalic_χ-iLQR that optimizes over the worst-case error growth of a hybrid trajectory. This method is based on the fundamental solution matrix, which maps the evolution of perturbations through a trajectory. Incorporating the saltation matrix into the fundamental solution matrix allows for straightforward handling of hybrid events. The simulation results presented on two legged robot models demonstrate that this method produces trajectories with improved tracking performance, decreased feedback actuation effort, and improved robustness to large perturbations. Even for a quadrupedal trajectory that was very difficult to track, χ𝜒\chiitalic_χ-iLQR produced a trajectory that was superior at avoiding failures.

Following this work, we aim to apply these principles to robot hardware and demonstrate robust performance for maneuvers such as lea** and flip**. We also aim to apply this work to other hybrid systems like robots that undergo stick-slip transitions. This work and its extensions will further enable robots to navigate complex, uncertain environments and unlock worlds for robots to explore.

References

  • [1] A. M. Johnson and D. E. Koditschek, “Toward a vocabulary of legged lea**,” in IEEE Intl. Conference on Robotics and Automation, Karlsruhe, Germany, May 2013, pp. 2553–2560.
  • [2] H. Kolvenbach, E. Hampp, P. Barton et al., “Towards jum** locomotion for quadruped robots on the moon,” IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019.
  • [3] C. Nguyen and Q. Nguyen, “Contact-timing and trajectory optimization for 3d jum** on quadruped robots,” IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022.
  • [4] Z. Li, X. B. Peng, P. Abbeel et al., “Robust and versatile bipedal jum** control through multi-task reinforcement learning,” arXiv preprint arXiv:2302.09450, 2023.
  • [5] A. van der Schaft and J. Schumacher, “Complementarity modeling of hybrid systems,” IEEE Transactions on Automatic Control, 1998.
  • [6] J. Lygeros, K. H. Johansson, S. Simic et al., “Dynamical properties of hybrid automata,” IEEE Transactions on Automatic Control, 2003.
  • [7] R. Tedrake, Underactuated Robotics, 2022. [Online]. Available: http://underactuated.mit.edu
  • [8] N. J. Kong, J. J. Payne, J. Zhu, and A. M. Johnson, “Saltation matrices: The essential tool for linearizing hybrid dynamical systems,” arXiv preprint arXiv:2306.06862, 2023.
  • [9] M. A. Haidekker, Linear Feedback Controls, M. A. Haidekker, Ed.   Elsevier, 2020.
  • [10] J. Norby, A. Tajbakhsh, Y. Yang, and A. M. Johnson, “Adaptive complexity model predictive control,” arXiv preprint arXiv:2209.02849, 2022.
  • [11] Z. Manchester and S. Kuindersma, “Robust direct trajectory optimization using approximate invariant funnels,” Auton. Robots, 2019.
  • [12] J. Morimoto, G. Zeglin, and C. Atkeson, “Minimax differential dynamic programming: application to a biped walking robot,” IEEE/RSJ International Conference on Intelligent Robots and Systems, 2003.
  • [13] T. Lew, R. Bonalli, and M. Pavone, “Chance-constrained sequential convex programming for robust trajectory optimization,” European Control Conference, 2020.
  • [14] B. Hammoud, M. Khadiv, and L. Righetti, “Impedance optimization for uncertain contact interactions through risk sensitive optimal control,” IEEE Robotics and Automation Letters, 2021.
  • [15] M. Ahmadi, X. Xiong, and A. D. Ames, “Risk-sensitive path planning via CVaR barrier functions: Application to bipedal locomotion,” IEEE Control Systems Letters, 2021.
  • [16] L. Drnach and Y. Zhao, “Robust trajectory optimization over uncertain terrain with stochastic complementarity,” IEEE Robotics and Automation Letters, 2021.
  • [17] S. Kousik, S. Vaskov, F. Bu et al., “Bridging the gap between safety and real-time performance in receding-horizon trajectory design for mobile robots,” The International Journal of Robotics Research, 2020.
  • [18] C.-Y. Lee, S. Yang, B. Bokser, and Z. Manchester, “Enhanced balance for legged robots using reaction wheels,” IEEE International Conference on Robotics and Automation, 2023.
  • [19] Y. Yang, J. Norby, J. K. Yim, and A. M. Johnson, “Proprioception and tail control enable extreme terrain traversal by quadruped robots,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2023.
  • [20] H. Khalil, Nonlinear Systems.   Pearson, 2002.
  • [21] A. Papachristodoulou and S. Prajna, “On the construction of Lyapunov functions using the sum of squares decomposition,” IEEE Conference on Decision and Control, 2002.
  • [22] S. Chen, M. Fazlyab, M. Morari et al., “Learning Lyapunov functions for hybrid systems,” International Conference on Hybrid Systems: Computation and Control, 2021.
  • [23] R. Grandia, A. J. Taylor, A. D. Ames, and M. Hutter, “Multi-layered safety for legged robots via control barrier functions and model predictive control,” IEEE International Conference on Robotics and Automation, 2021.
  • [24] W. Lohmiller and J. Slotine, “On contraction analysis for non-linear systems,” Automatica, 1998.
  • [25] A. M. Johnson, J. E. King, and S. Srinivasa, “Convergent planning,” IEEE Robotics and Automation Letters, 2016.
  • [26] N. Kong and A. M. Johnson, “Optimally convergent trajectories for navigation,” in International Symposium on Robotics Research, October 2019.
  • [27] S. A. Burden, T. Libby, and S. D. Coogan, “On contraction analysis for hybrid systems,” arXiv preprint arXiv:1811.03956, 2018.
  • [28] J. Zhu, N. Kong, G. Council, and A. Johnson, “Hybrid event sha** to stabilize periodic hybrid orbits,” IEEE International Conference on Robotics and Automation, 2022.
  • [29] A. M. Johnson, S. A. Burden, and D. E. Koditschek, “A hybrid systems model for simple manipulation and self-manipulation systems,” The International Journal of Robotics Research, 2016.
  • [30] M. Hirsch, S. Smale, and R. Devaney, Differential Equations, Dynamical Systems, and an Introduction to Chaos.   Academic Press, 2012.
  • [31] R. Leine and H. Nijmeijer, Dynamics and Bifurcations of Non-Smooth Mechanical Systems.   Springer-Verlag Berlin Heidelberg, 2004.
  • [32] W. Li and E. Todorov, “Iterative linear quadratic regulator design for nonlinear biological movement systems,” International Conference on Informatics in Control, Automation and Robotics, 2004.
  • [33] M. Kelly, “An introduction to trajectory optimization: How to do your own direct collocation,” SIAM Review, 2017.
  • [34] N. J. Kong, G. Council, and A. M. Johnson, “iLQR for piecewise-smooth hybrid dynamical systems,” IEEE Conference on Decision and Control, 2021.
  • [35] N. Kong, C. Li, G. Council, and A. M. Johnson, “Hybrid iLQR model predictive control for contact implicit stabilization on legged robots,” IEEE Transactions on Robotics, 2023, to appear. Also available at arXiv:2207.04591 [cs.RO].
  • [36] J. Townsend, “Differentiating the singular value decomposition,” Tech. Rep., 2016. [Online]. Available: https://j-towns.github.io/papers/svd-derivative.pdf
  • [37] S. Singh, R. P. Russell, and P. M. Wensing, “On second-order derivatives of rigid-body dynamics: Theory & implementation,” arXiv preprint arXiv:2302.06001, 2023.
  • [38] C. G. Broyden, “The convergence of a class of double-rank minimization algorithms 1. General considerations,” IMA Journal of Applied Mathematics, 1970.
  • [39] K. Endo and H. Herr, “A model of muscle-tendon function in human walking at self-selected speed,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2014.
  • [40] M. Scheint, M. Sobotka, and M. Buss, “Optimized parallel joint springs in dynamic motion: Comparison of simulation and experiment,” IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics, 2010.
  • [41] K. Mombaur, “Using optimization to create self-stable human-like running,” Robotica, 2009.
  • [42] D. Lakatos, K. Ploeger, F. Loeffl et al., “Dynamic locomotion gaits of a compliantly actuated quadruped with slip-like articulated legs embodied in the mechanical design,” IEEE Robotics and Automation Letters, 2018.
  • [43] J. Ackerman and J. Seipel, “Energy efficiency of legged robot locomotion with elastically suspended loads,” IEEE Transactions on Robotics, 2013.
  • [44] M. J. Pollayil, C. D. Santina, G. Mesesan et al., “Planning natural locomotion for articulated soft quadrupeds,” IEEE International Conference on Robotics and Automation, 2022.