Learning to Boost the Performance
of Stable Nonlinear Systems

Luca Furieri, Clara Lucía Galimberti, and Giancarlo Ferrari-Trecate L. Furieri, C. L. Galimberti, and G. Ferrari-Trecate are with the Institute of Mechanical Engineering, EPFL, Switzerland. E-mail addresses: {luca.furieri, clara.galimberti, giancarlo.ferraritrecate}@epfl.ch.Research supported by the Swiss National Science Foundation (SNSF) under the NCCR Automation (grant agreement 51NF40_80545). Luca Furieri is also grateful to the SNSF for the Ambizione grant PZ00P2_208951.
Abstract

The growing scale and complexity of safety-critical control systems underscore the need to evolve current control architectures aiming for the unparalleled performances achievable through state-of-the-art optimization and machine learning algorithms. However, maintaining closed-loop stability while boosting the performance of nonlinear control systems using data-driven and deep-learning approaches stands as an important unsolved challenge. In this paper, we tackle the performance-boosting problem with closed-loop stability guarantees. Specifically, we establish a synergy between the Internal Model Control (IMC) principle for nonlinear systems and state-of-the-art unconstrained optimization approaches for learning stable dynamics. Our methods enable learning over arbitrarily deep neural network classes of performance-boosting controllers for stable nonlinear systems; crucially, we guarantee psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT closed-loop stability even if optimization is halted prematurely, and even when the ground-truth dynamics are unknown, with vanishing conservatism in the class of stabilizing policies as the model uncertainty is reduced to zero. We discuss the implementation details of the proposed control schemes, including distributed ones, along with the corresponding optimization procedures, demonstrating the potential of freely sha** the cost functions through several numerical experiments.

Index Terms:
Optimal control, Closed-loop stability, Learning for control, Internal model control, Uncertain systems, Distributed control

I Introduction

The success of control systems across a broad spectrum of applications — from manufacturing to water, power, and transportation networks [1] — is rooted not only in advancements in sensing, computation, and communication but also in the growing availability of methods for designing model-based controllers capable of stabilizing nonlinear systems at nominal operating conditions.

However, in many applications, merely stabilizing the closed-loop system is not sufficient; achieving satisfactory performance is also crucial, often necessitating the integration of additional control loops. In Nonlinear Optimal Control (NOC), performance requirements are typically encoded in the shape of the cost function that the control policy strives to minimize. Consequently, it is beneficial to develop NOC algorithms that accommodate general nonlinear costs to enable sophisticated closed-loop behaviors, such as collision avoidance or waypoint tracking in swarms of robots.

In this paper, we tackle the following performance-boosting problem: given a discrete-time nonlinear system that is stable or has been pre-stabilized using a base controller, how can we enhance its performance during the transient — that is, before the system settles into a steady state — by employing general cost functions without compromising stability?

A first approach to designing performance-boosting regulators involves resorting to NOC methods with stability guarantees. Despite extensive research in this area [2], the NOC problem is fully understood only when the system dynamics are linear and the cost admits a convex reformulation. For nonlinear systems, traditional methods for addressing NOC include dynamic programming and the maximum principle [3, 4]. However, the computation of NOC policies through these methods often faces significant computational challenges [4]. Furthermore, to ensure stability, stringent limitations must be imposed on the class of costs that can be utilized. An alternative approach to tackling performance-boosting is offered by receding-horizon control schemes, such as Nonlinear Model Predictive Control (NMPC) [5]. These controllers are based on real-time optimization; a finite-horizon NOC problem is solved at each time instant to determine the control input. However, a significant limitation of NMPC is that the control policy can seldom be precomputed and stored in an explicit form, which makes NMPC inapplicable when the control platform lacks the computational resources necessary to solve mathematical programs in real-time. Moreover, similar to NOC, ensuring stability requires imposing strong limitations on the class of admissible cost functions [5].

More recently, Reinforcement Learning (RL) and Deep Neural Networks (DNNs) have emerged as powerful tools that enable agents to understand and optimally interact with complex environments and dynamical systems, e.g., [6, 7]. Many RL approaches are based on minimizing arbitrary cost functions, calling for the use of broad sets of candidate nonlinear control policies. To this end, RL methods often employ families of policies that incorporate deep Neural Networks (NNs), due to their ability to model rich classes of nonlinear functions. These capabilities have led to remarkable applications, such as four-legged robots navigating challenging terrains [8] and drones that can outperform humans in races [9, 10]. On the other hand, general methodologies for designing RL policies for nonlinear dynamical systems, while ensuring closed-loop stability, are currently scarce and may be limited by strong assumptions [11, 12, 13]. As a result, so far the applicability of RL approaches has been mainly limited to systems that are not safety-critical.

Independent of their application in RL, NNs have been employed in model-based control since the 1990s for approximating nonlinear receding horizon policies [14, 15] or synthesizing nonlinear regulators from scratch [16]. Recent results on the design of provably stabilizing DNN control policies fall into two categories. The first one comprises constrained optimization approaches [11, 17, 18] that ensure global or local stability by enforcing Lyapunov-like inequalities during optimization. However, conservative stability constraints can severely restrict the range of admissible policies or fail to produce a viable controller even when it exists. Additionally, enforcing constraints such as linear matrix inequalities becomes a computational bottleneck in large-scale applications.

The second category embraces unconstrained optimization approaches, aiming to define classes of control policies with built-in stability guarantees [19, 20, 21]. These methods, which are similar to those developed in this paper, allow unconstrained optimization over finitely many parameters — using, for instance, standard gradient descent techniques — without sacrificing stability, regardless of the chosen parameter values. Optimizing over sets of stabilizing policies has two main benefits. First, it completely decouples the stabilization problem from the choice of the cost being optimized. Second, it enables fail-safe design, that is, the ability to guarantee closed-loop stability even if the policy optimization ends at a local minimum or is prematurely halted. However, these approaches are limited to discrete-time linear systems [19, 20] or to continuous-time systems in the port-Hamiltonian form [21]. While recent work surpasses the limitations above [22, 23], in real-world applications, the knowledge about the system model is not perfect. The impact of modeling errors on the parametrizations of stable closed-loop maps for nonlinear systems has remained largely unexplored.

I-A Contributions

This paper explores approaches to solve performance-boosting problems in general discrete-time, time-varying systems. Specifically, we develop unconstrained optimization approaches based on classes of state-feedback policies that induce closed-loop dynamics described by stable and arbitrarily deep NNs.

After formally stating the performance-boosting problem in Section II, we present our first contribution, which provides a complete characterization of the class of stability-preserving controllers for stable systems. This result is presented in Section III and reveals that an Internal Model Control (IMC) structure [24, 25, 26] allows characterizing, without conservatism, the class of all stability-preserving controllers, where the only free parameter is an psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT operator. Our results hinge on adapting nonlinear variants of the Youla parametrization [27, 28] to discrete-time systems. Further, we examine the relationship with the recently proposed nonlinear System Level Synthesis (SLS) framework developed in [29]. In Section IV, our main contribution is that the proposed approach is compatible with scenarios where only an approximate system description is available, such as models identified from data or derived from simplified physical principles. Specifically, under a finite gain assumption on the model mismatch, stability can always be preserved by embedding a nominal system model and optimizing over nonlinear controllers with a sufficiently reduced gain on the free psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT parameter. Importantly, the method ensures vanishing conservatism as the model uncertainty approaches zero. Additionally, by considering networks of interconnected subsystems, we demonstrate how the IMC structure of our controllers naturally lends itself to the development of distributed policies where the communication topology mirrors the subsystem couplings.

Finally, Section V bridges the gap between theoretical developments and computations, showing how to use Recurrent Equilibrium Networks (RENs) [30, 31] to obtain a finite-dimensional parametrization of performance-boosting controllers that can include DNNs. The final part of the paper in Section VI presents several simulations by considering coordination problems for mobile robots. Specifically, we show how, similarly to RL, the freedom in specifying the optimization cost allows designing NN controllers that can boost various forms of performance and safety, reaching beyond classical optimal control objectives consisting of the sum of stage-costs over time [3].

I-B Notation

Signals and operators: The set of all sequences 𝐱=(x0,x1,x2,)𝐱subscript𝑥0subscript𝑥1subscript𝑥2\mathbf{x}=(x_{0},x_{1},x_{2},\ldots)bold_x = ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … ), where xtnsubscript𝑥𝑡superscript𝑛x_{t}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N , is denoted as nsuperscript𝑛\ell^{n}roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Moreover, 𝐱𝐱\mathbf{x}bold_x belongs to pnnsuperscriptsubscript𝑝𝑛superscript𝑛\ell_{p}^{n}\subset\ell^{n}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⊂ roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with p𝑝p\in\mathbb{N}\cup\inftyitalic_p ∈ blackboard_N ∪ ∞ if 𝐱p=(t=0|xt|p)1p<subscriptdelimited-∥∥𝐱𝑝superscriptsuperscriptsubscript𝑡0superscriptsubscript𝑥𝑡𝑝1𝑝\left\lVert\mathbf{x}\right\rVert_{p}=\left(\sum_{t=0}^{\infty}|x_{t}|^{p}% \right)^{\frac{1}{p}}<\infty∥ bold_x ∥ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_p end_ARG end_POSTSUPERSCRIPT < ∞, where |||\cdot|| ⋅ | denotes any vector norm. We say that 𝐱n𝐱subscriptsuperscript𝑛\mathbf{x}\in\ell^{n}_{\infty}bold_x ∈ roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT if supt|xt|<subscriptsup𝑡subscript𝑥𝑡\operatorname{sup}_{t}|x_{t}|<\inftyroman_sup start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | < ∞. When clear from the context, we omit the superscript n𝑛nitalic_n from nsuperscript𝑛\ell^{n}roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and pnsubscriptsuperscript𝑛𝑝\ell^{n}_{p}roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. An operator 𝐀𝐀\mathbf{A}bold_A is said to be psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-stable111We also say that the operator is stable, for short, when the value of p𝑝pitalic_p is clear from the context. if it is causal and 𝐀(𝐰)pm𝐀𝐰superscriptsubscript𝑝𝑚\mathbf{A}(\mathbf{w})\in\ell_{p}^{m}bold_A ( bold_w ) ∈ roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT for all 𝐰pn𝐰superscriptsubscript𝑝𝑛\mathbf{w}\in\ell_{p}^{n}bold_w ∈ roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Equivalently, we write 𝐀p𝐀subscript𝑝\mathbf{A}\in\mathcal{L}_{p}bold_A ∈ caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. We say that an psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT operator 𝐀:𝐰𝐮:𝐀maps-to𝐰𝐮\mathbf{A}:\mathbf{w}\mapsto\mathbf{u}bold_A : bold_w ↦ bold_u has finite psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-gain γ(𝐀)>0𝛾𝐀0\gamma(\mathbf{A})>0italic_γ ( bold_A ) > 0 if 𝐮γ(𝐀)𝐰norm𝐮𝛾𝐀norm𝐰\|\mathbf{u}\|\leq\gamma(\mathbf{A})\|\mathbf{w}\|∥ bold_u ∥ ≤ italic_γ ( bold_A ) ∥ bold_w ∥, for all 𝐰pn𝐰superscriptsubscript𝑝𝑛\mathbf{w}\in\ell_{p}^{n}bold_w ∈ roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.

Time-series: We use the notation xj:isubscript𝑥:𝑗𝑖x_{j:i}italic_x start_POSTSUBSCRIPT italic_j : italic_i end_POSTSUBSCRIPT to refer to the truncation of 𝐱𝐱\mathbf{x}bold_x to the finite-dimensional vector (xi,xi+1,,xj)subscript𝑥𝑖subscript𝑥𝑖1subscript𝑥𝑗(x_{i},x_{i+1},\ldots,x_{j})( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). An operator 𝐀:nm:𝐀superscript𝑛superscript𝑚\mathbf{A}:\ell^{n}\rightarrow\ell^{m}bold_A : roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → roman_ℓ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT is said to be causal if 𝐀(𝐱)=(A0(x0),A1(x1:0),,At(xt:0),)𝐀𝐱subscript𝐴0subscript𝑥0subscript𝐴1subscript𝑥:10subscript𝐴𝑡subscript𝑥:𝑡0\mathbf{A}(\mathbf{x})=(A_{0}(x_{0}),A_{1}(x_{1:0}),\ldots,A_{t}(x_{t:0}),\ldots)bold_A ( bold_x ) = ( italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : 0 end_POSTSUBSCRIPT ) , … , italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t : 0 end_POSTSUBSCRIPT ) , … ). If in addition At(xt:0)=At(xt1:0,0)subscript𝐴𝑡subscript𝑥:𝑡0subscript𝐴𝑡subscript𝑥:𝑡100A_{t}(x_{t:0})=A_{t}(x_{t-1:0},0)italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t : 0 end_POSTSUBSCRIPT ) = italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT , 0 ), then 𝐀𝐀\mathbf{A}bold_A is said to be strictly causal. Similarly, we define Aj:i(xj:0)=(Ai(xi:0),Ai+1(xi+1:0),,Aj(xj:0))subscript𝐴:𝑗𝑖subscript𝑥:𝑗0subscript𝐴𝑖subscript𝑥:𝑖0subscript𝐴𝑖1subscript𝑥:𝑖10subscript𝐴𝑗subscript𝑥:𝑗0A_{j:i}(x_{j:0})=(A_{i}(x_{i:0}),A_{i+1}(x_{i+1:0}),\ldots,A_{j}(x_{j:0}))italic_A start_POSTSUBSCRIPT italic_j : italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j : 0 end_POSTSUBSCRIPT ) = ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i : 0 end_POSTSUBSCRIPT ) , italic_A start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i + 1 : 0 end_POSTSUBSCRIPT ) , … , italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j : 0 end_POSTSUBSCRIPT ) ). For a matrix Mm×n𝑀superscript𝑚𝑛M\in\mathbb{R}^{m\times n}italic_M ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_n end_POSTSUPERSCRIPT, M𝐱=(Mx0,Mx1,)m𝑀𝐱𝑀subscript𝑥0𝑀subscript𝑥1superscript𝑚M\mathbf{x}=(Mx_{0},Mx_{1},\ldots)\in\ell^{m}italic_M bold_x = ( italic_M italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_M italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) ∈ roman_ℓ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT.

Graph theory: Given an undirected graph 𝒢=(𝒱,)𝒢𝒱\mathcal{G}=(\mathcal{V},\mathcal{E})caligraphic_G = ( caligraphic_V , caligraphic_E ) described by the set of nodes 𝒱={1,,N}𝒱1𝑁\mathcal{V}=\{1,\ldots,N\}caligraphic_V = { 1 , … , italic_N } and the set of edges 𝒱×𝒱𝒱𝒱\mathcal{E}\subset\mathcal{V}\times\mathcal{V}caligraphic_E ⊂ caligraphic_V × caligraphic_V, we denote set of neighbors of node i𝑖iitalic_i, including i𝑖iitalic_i itself by 𝒩i={i}{j|{i,j}}𝒱subscript𝒩𝑖𝑖conditional-set𝑗𝑖𝑗𝒱\mathcal{N}_{i}=\{i\}\cup\{j\ |\ \{i,j\}\in\mathcal{E}\}\subseteq\mathcal{V}caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_i } ∪ { italic_j | { italic_i , italic_j } ∈ caligraphic_E } ⊆ caligraphic_V. We denote with col(v[j])j𝒱{}_{j\in\mathcal{V}}(v^{[j]})start_FLOATSUBSCRIPT italic_j ∈ caligraphic_V end_FLOATSUBSCRIPT ( italic_v start_POSTSUPERSCRIPT [ italic_j ] end_POSTSUPERSCRIPT ) a vector which consists of the stacked subvectors v[j]superscript𝑣delimited-[]𝑗v^{[j]}italic_v start_POSTSUPERSCRIPT [ italic_j ] end_POSTSUPERSCRIPT from j=1𝑗1j=1italic_j = 1 to j=N𝑗𝑁j=Nitalic_j = italic_N and with v[𝒩i]superscript𝑣delimited-[]subscript𝒩𝑖v^{[\mathcal{N}_{i}]}italic_v start_POSTSUPERSCRIPT [ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUPERSCRIPT a vector composed by the stacked subvectors v[j]superscript𝑣delimited-[]𝑗v^{[j]}italic_v start_POSTSUPERSCRIPT [ italic_j ] end_POSTSUPERSCRIPT of all neighbors of node i𝑖iitalic_i, i.e., v[𝒩i]=colj𝒩i(v[j])superscript𝑣delimited-[]subscript𝒩𝑖𝑐𝑜subscript𝑙𝑗subscript𝒩𝑖superscript𝑣delimited-[]𝑗v^{[\mathcal{N}_{i}]}=col_{j\in\mathcal{N}_{i}}(v^{[j]})italic_v start_POSTSUPERSCRIPT [ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUPERSCRIPT = italic_c italic_o italic_l start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_v start_POSTSUPERSCRIPT [ italic_j ] end_POSTSUPERSCRIPT ). For a signal 𝐱n𝐱superscript𝑛\mathbf{x}\in\ell^{n}bold_x ∈ roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, where xt=coli𝒱(xt[i])subscript𝑥𝑡𝑐𝑜subscript𝑙𝑖𝒱subscriptsuperscript𝑥delimited-[]𝑖𝑡x_{t}=col_{i\in\mathcal{V}}(x^{[i]}_{t})italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_c italic_o italic_l start_POSTSUBSCRIPT italic_i ∈ caligraphic_V end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), xt[i]nisuperscriptsubscript𝑥𝑡delimited-[]𝑖superscriptsubscript𝑛𝑖x_{t}^{[i]}\in\mathbb{R}^{n_{i}}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and n=i=1Nni𝑛superscriptsubscript𝑖1𝑁subscript𝑛𝑖n=\sum_{i=1}^{N}n_{i}italic_n = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we denote with 𝐱[i]nisuperscript𝐱delimited-[]𝑖superscriptsubscript𝑛𝑖\mathbf{x}^{[i]}\in\ell^{n_{i}}bold_x start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ∈ roman_ℓ start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT the sequence 𝐱[i]=(x0[i],x1[i],)superscript𝐱delimited-[]𝑖superscriptsubscript𝑥0delimited-[]𝑖superscriptsubscript𝑥1delimited-[]𝑖\mathbf{x}^{[i]}=(x_{0}^{[i]},x_{1}^{[i]},\ldots)bold_x start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT = ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT , … ). Similarly, we define sequence 𝐱[𝒩i]=(x0[𝒩i],x1[𝒩i],)superscript𝐱delimited-[]subscript𝒩𝑖superscriptsubscript𝑥0delimited-[]subscript𝒩𝑖superscriptsubscript𝑥1delimited-[]subscript𝒩𝑖\mathbf{x}^{[\mathcal{N}_{i}]}=(x_{0}^{[\mathcal{N}_{i}]},x_{1}^{[\mathcal{N}_% {i}]},\ldots)bold_x start_POSTSUPERSCRIPT [ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUPERSCRIPT = ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUPERSCRIPT , … ).

II The Performance-boosting Problem

We consider nonlinear discrete-time time-varying systems

xt=ft(xt1:0,ut1:0)+wt,t=1,2,,formulae-sequencesubscript𝑥𝑡subscript𝑓𝑡subscript𝑥:𝑡10subscript𝑢:𝑡10subscript𝑤𝑡𝑡12x_{t}=f_{t}(x_{t-1:0},u_{t-1:0})+w_{t}\,,~{}~{}~{}t=1,2,\ldots\,,italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT ) + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t = 1 , 2 , … , (1)

where xtnsubscript𝑥𝑡superscript𝑛x_{t}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is the state vector, utmsubscript𝑢𝑡superscript𝑚u_{t}\in\mathbb{R}^{m}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT is the control input, wtnsubscript𝑤𝑡superscript𝑛w_{t}\in\mathbb{R}^{n}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT stands for unknown process noise with w0=x0subscript𝑤0subscript𝑥0w_{0}=x_{0}italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and f0=0subscript𝑓00f_{0}=0italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0. The system model (1) is very general. For instance, it can describe the dynamics of the error between the state of a nonlinear system and a reference trajectory in psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. In operator form, system (1) is equivalent to

𝐱=𝐅(𝐱,𝐮)+𝐰,𝐱𝐅𝐱𝐮𝐰\mathbf{x}=\mathbf{F}(\mathbf{x},\mathbf{u})+\mathbf{w}\,,bold_x = bold_F ( bold_x , bold_u ) + bold_w , (2)

where 𝐅:n×mn:𝐅superscript𝑛superscript𝑚superscript𝑛\mathbf{F}:\ell^{n}\times\ell^{m}\rightarrow\ell^{n}bold_F : roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × roman_ℓ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT → roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is the strictly causal operator such that 𝐅(𝐱,𝐮)=(0,f1(x0,u0),,ft(xt1:0,ut1:0),)𝐅𝐱𝐮0subscript𝑓1subscript𝑥0subscript𝑢0subscript𝑓𝑡subscript𝑥:𝑡10subscript𝑢:𝑡10\mathbf{F}(\mathbf{x},\mathbf{u})=(0,f_{1}(x_{0},u_{0}),\ldots,f_{t}(x_{t-1:0}% ,u_{t-1:0}),\ldots)bold_F ( bold_x , bold_u ) = ( 0 , italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , … , italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT ) , … ). Note that 𝐰=(x0,w1,)𝐰subscript𝑥0subscript𝑤1\mathbf{w}=(x_{0},w_{1},\ldots)bold_w = ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … ) and 𝐮𝐮\mathbf{u}bold_u collects all data needed for defining the system evolution over an infinite horizon. As an example, when the system (1) takes the Linear Time Invariant (LTI) form

xt=Axt1+But1+wt,subscript𝑥𝑡𝐴subscript𝑥𝑡1𝐵subscript𝑢𝑡1subscript𝑤𝑡x_{t}=Ax_{t-1}+Bu_{t-1}+w_{t}\,,italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_A italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_B italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (3)

the model (2) becomes

[x0x1x2]=[000A000A0][x0x1x2]+[000B000B0][u0u1u2]+[x0w1w2].matrixsubscript𝑥0subscript𝑥1subscript𝑥2matrix000𝐴000𝐴0matrixsubscript𝑥0subscript𝑥1subscript𝑥2matrix000𝐵000𝐵0matrixsubscript𝑢0subscript𝑢1subscript𝑢2matrixsubscript𝑥0subscript𝑤1subscript𝑤2\begin{bmatrix}x_{0}\\ x_{1}\\ x_{2}\\ \vdots\end{bmatrix}=\begin{bmatrix}0&0&0&\cdots\\ A&0&0&\cdots\\ 0&A&0&\cdots\\ \vdots&\vdots&\vdots&\ddots\end{bmatrix}\begin{bmatrix}x_{0}\\ x_{1}\\ x_{2}\\ \vdots\end{bmatrix}+\begin{bmatrix}0&0&0&\cdots\\ B&0&0&\cdots\\ 0&B&0&\cdots\\ \vdots&\vdots&\vdots&\ddots\end{bmatrix}\begin{bmatrix}u_{0}\\ u_{1}\\ u_{2}\\ \vdots\end{bmatrix}+\begin{bmatrix}x_{0}\\ w_{1}\\ w_{2}\\ \vdots\end{bmatrix}\,.[ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL end_ROW start_ROW start_CELL italic_A end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_A end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL end_ROW start_ROW start_CELL italic_B end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_B end_CELL start_CELL 0 end_CELL start_CELL ⋯ end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW end_ARG ] .

We consider disturbances with support 𝒲tnsubscript𝒲𝑡superscript𝑛\mathcal{W}_{t}\subseteq\mathbb{R}^{n}caligraphic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT following a random vector distribution 𝒟tsubscript𝒟𝑡\mathcal{D}_{t}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, that is, wt𝒲tsubscript𝑤𝑡subscript𝒲𝑡w_{t}\in\mathcal{W}_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and wt𝒟tsimilar-tosubscript𝑤𝑡subscript𝒟𝑡w_{t}\sim\mathcal{D}_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for every t=0,1,𝑡01t=0,1,\ldotsitalic_t = 0 , 1 , …. In order to control the behavior of system (1), we consider nonlinear, state-feedback, time-varying control policies

𝐮=𝐊(𝐱)=(K0(x0),K1(x1:0),,Kt(xt:0),),𝐮𝐊𝐱subscript𝐾0subscript𝑥0subscript𝐾1subscript𝑥:10subscript𝐾𝑡subscript𝑥:𝑡0\mathbf{u}=\mathbf{K}(\mathbf{x})=(K_{0}(x_{0}),K_{1}(x_{1:0}),\ldots,K_{t}(x_% {t:0}),\ldots)\,,bold_u = bold_K ( bold_x ) = ( italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 : 0 end_POSTSUBSCRIPT ) , … , italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t : 0 end_POSTSUBSCRIPT ) , … ) , (4)

where 𝐊:nm:𝐊superscript𝑛superscript𝑚\mathbf{K}:\ell^{n}\ \rightarrow\ell^{m}bold_K : roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → roman_ℓ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT is a causal operator to be designed. Note that the controller 𝐊𝐊\mathbf{K}bold_K can be dynamic, as Ktsubscript𝐾𝑡K_{t}italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can depend on the whole past history of the system state. Since for each 𝐰n𝐰superscript𝑛\mathbf{w}\in\ell^{n}bold_w ∈ roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝐮m𝐮superscript𝑚\mathbf{u}\in\ell^{m}bold_u ∈ roman_ℓ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT the system (1) produces a unique state sequence 𝐱n𝐱superscript𝑛\mathbf{x}\in\ell^{n}bold_x ∈ roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, equation (2) defines a unique transition operator

¬§maps-to¬§\mathbfcal{F}:(\mathbf{u},\mathbf{w})\mapsto\mathbf{x}\,,roman_ℱ ¬ ⇐ ⊓ ⇔ ⊒ ⇒ ↦ § ⇔

which provides an input-to-state model of system (1). Similarly, for each 𝐰n𝐰superscript𝑛\mathbf{w}\in\ell^{n}bold_w ∈ roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT the closed-loop system (1)-(4) produces unique trajectories. Hence, the closed-loop map** 𝐰(𝐱,𝐮)maps-to𝐰𝐱𝐮\mathbf{w}\mapsto(\mathbf{x},\mathbf{u})bold_w ↦ ( bold_x , bold_u ) is well-defined. Specifically, for a system 𝐅𝐅\mathbf{F}bold_F and a controller 𝐊𝐊\mathbf{K}bold_K, we denote the corresponding induced closed-loop operators 𝐰𝐱maps-to𝐰𝐱\mathbf{w}\mapsto\mathbf{x}bold_w ↦ bold_x and 𝐰𝐮maps-to𝐰𝐮\mathbf{w}\mapsto\mathbf{u}bold_w ↦ bold_u as 𝚽𝐱[𝐅,𝐊]superscript𝚽𝐱𝐅𝐊\bm{\Phi}^{\mathbf{x}}[\mathbf{F},\mathbf{K}]bold_Φ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT [ bold_F , bold_K ] and 𝚽𝐮[𝐅,𝐊]superscript𝚽𝐮𝐅𝐊\bm{\Phi}^{\mathbf{u}}[\mathbf{F},\mathbf{K}]bold_Φ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT [ bold_F , bold_K ], respectively. Therefore, we have 𝐱=𝚽𝐱[𝐅,𝐊](𝐰)𝐱superscript𝚽𝐱𝐅𝐊𝐰\mathbf{x}=\bm{\Phi}^{\mathbf{x}}[\mathbf{F},\mathbf{K}](\mathbf{w})bold_x = bold_Φ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT [ bold_F , bold_K ] ( bold_w ) and 𝐮=𝚽𝐮[𝐅,𝐊](𝐰)𝐮superscript𝚽𝐮𝐅𝐊𝐰\mathbf{u}=\bm{\Phi}^{\mathbf{u}}[\mathbf{F},\mathbf{K}](\mathbf{w})bold_u = bold_Φ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT [ bold_F , bold_K ] ( bold_w ) for all 𝐰n𝐰superscript𝑛\mathbf{w}\in\ell^{n}bold_w ∈ roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.

Definition 1.

The closed-loop system (1)-(4) is psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-stable if 𝚽𝐮[𝐅,𝐊]superscript𝚽𝐮𝐅𝐊\bm{\Phi}^{\mathbf{u}}[\mathbf{F},\mathbf{K}]bold_Φ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT [ bold_F , bold_K ] and 𝚽𝐮[𝐅,𝐊]superscript𝚽𝐮𝐅𝐊\bm{\Phi}^{\mathbf{u}}[\mathbf{F},\mathbf{K}]bold_Φ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT [ bold_F , bold_K ] are in psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT.

Our goal is to synthesize a control policy 𝐊𝐊\mathbf{K}bold_K solving the following problem.

Problem 1 (Performance boosting).

Assume that \mathbfcal{F}roman_ℱ lies in psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. Find 𝐊𝐊\mathbf{K}bold_K solving the finite-horizon Nonlinear Optimal Control (NOC) problem

min𝐊()subscript𝐊\displaystyle\min_{\mathbf{K}(\cdot)}roman_min start_POSTSUBSCRIPT bold_K ( ⋅ ) end_POSTSUBSCRIPT 𝔼wT:0[L(xT:0,uT:0)]subscript𝔼subscript𝑤:𝑇0delimited-[]𝐿subscript𝑥:𝑇0subscript𝑢:𝑇0\displaystyle\qquad\mathbb{E}_{w_{T:0}}\left[L(x_{T:0},u_{T:0})\right]blackboard_E start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_L ( italic_x start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT ) ] (5a)
s.t.\displaystyle\operatorname{s.t.}~{}~{}start_OPFUNCTION roman_s . roman_t . end_OPFUNCTION xt=ft(xt1:0,ut1:0)+wt,w0=x0,formulae-sequencesubscript𝑥𝑡subscript𝑓𝑡subscript𝑥:𝑡10subscript𝑢:𝑡10subscript𝑤𝑡subscript𝑤0subscript𝑥0\displaystyle x_{t}=f_{t}(x_{t-1:0},u_{t-1:0})+w_{t}\,,~{}~{}w_{0}=x_{0}\,,italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT ) + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ,
ut=Kt(xt:0),t=0,1,,formulae-sequencesubscript𝑢𝑡subscript𝐾𝑡subscript𝑥:𝑡0for-all𝑡01\displaystyle u_{t}=K_{t}(x_{t:0})\,,~{}~{}\forall t=0,1,\ldots\,,italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t : 0 end_POSTSUBSCRIPT ) , ∀ italic_t = 0 , 1 , … ,
(𝚽𝐱[𝐅,𝐊],𝚽𝐮[𝐅,𝐊])p,superscript𝚽𝐱𝐅𝐊superscript𝚽𝐮𝐅𝐊subscript𝑝\displaystyle(\bm{\Phi}^{\mathbf{x}}[\mathbf{F},\mathbf{K}],\bm{\Phi}^{\mathbf% {u}}[\mathbf{F},\mathbf{K}])\in\mathcal{L}_{p}\,\,,( bold_Φ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT [ bold_F , bold_K ] , bold_Φ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT [ bold_F , bold_K ] ) ∈ caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , (5b)

where L()𝐿L(\cdot)italic_L ( ⋅ ) defines a loss over realized trajectories xT:0subscript𝑥:𝑇0x_{T:0}italic_x start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT and uT:0subscript𝑢:𝑇0u_{T:0}italic_u start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT, and the expectation 𝔼wT:0[]subscript𝔼subscript𝑤:𝑇0delimited-[]\mathbb{E}_{w_{T:0}}[\cdot]blackboard_E start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ⋅ ] removes the effect of disturbances wT:0subscript𝑤:𝑇0w_{T:0}italic_w start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT on the realized values of the loss.222Another common choice is to use maxwT:0𝒲T:0[]subscriptsubscript𝑤:𝑇0subscript𝒲:𝑇0\max_{w_{T:0}\in\mathcal{W}_{T:0}}[\cdot]roman_max start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT ∈ caligraphic_W start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ⋅ ] instead of the expectation. Other useful choices include VarwT:0[]subscriptVarsubscript𝑤:𝑇0\operatorname{Var}_{w_{T:0}}[\cdot]roman_Var start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ⋅ ], CVARwT:0[]subscriptCVARsubscript𝑤:𝑇0\operatorname{CVAR}_{w_{T:0}}[\cdot]roman_CVAR start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ⋅ ], and weighted combinations of all the above. In practice, one can approximate the chosen operator that removes the effect of disturbances from the cost by performing multiple experiments.

The main feature of (5) is that the cost is optimized over the finite horizon 0,,T0𝑇0,\ldots,T0 , … , italic_T, but under the strict requirement that the closed-loop system is stable when it evolves over 0,,+00,\ldots,+\infty0 , … , + ∞. In other words, the feedback controller must preserve stability of \mathbfcal{F}roman_ℱ, and its role is to boost the performance of the system in the transient 0,,T0𝑇0,\ldots,T0 , … , italic_T. As it will be clear in the sequel, we consider iterative control design algorithms based on gradient descent that are fail-safe, in the sense that they search in sets of controllers that are stability-preserving by design. This guarantees closed-loop stability during the optimization of the policy parameters. Note also that, as it is standard in NOC, we do not expect gradient descent to find the globally optimal solution for any initialization — this is generally impossible for problems beyond Linear Quadratic Gaussian (LQG) control, which enjoy convexity of the cost and linearity of the optimal policies [32, 33]. Furthermore, the expected value in (5a) can seldom be computed333For instance because it is too costly or the distribution 𝒟𝒟\mathcal{D}caligraphic_D is unknown. and is approximated by using samples of wT:0subscript𝑤:𝑇0w_{T:0}italic_w start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT. Fail-safe design guarantees that, in spite of all these limitations, closed-loop stability is never lost.

III Unconstrained Parametrization of all Stability-preserving Controllers

As a preliminary step towards fail-safe design for stable systems, we show how to parametrize all stability-preserving policies by using an IMC control architecture [24, 25], depending on an operator \mathbfcal{M}roman_ℳ that can be freely chosen in psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. Specifically, the block diagram of the proposed control architecture is represented in Figure 1 and it includes a copy of the system dynamics, which is used for computing the estimate 𝕨^^𝕨\hat{\mathbb{w}}over^ start_ARG blackboard_w end_ARG of the disturbance 𝕨𝕨\mathbb{w}blackboard_w.

Refer to caption
Figure 1: IMC architecture parametrizing of all stabilizing controllers in terms of one freely chosen operator subscript\mathbfcal{M}\in\mathcal{L}_{p}roman_ℳ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT.

We are now in a position to introduce the main result.

Theorem 1.

Assume that the operator \mathbfcal{F}roman_ℱ is psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-stable, i.e. 𝕩p𝕩subscript𝑝\mathbb{x}\in\ell_{p}blackboard_x ∈ roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT if (𝕨,𝕦)p𝕨𝕦subscript𝑝(\mathbb{w},\mathbb{u})\in\ell_{p}( blackboard_w , blackboard_u ) ∈ roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, and consider the evolution of (2) where 𝐮𝐮\mathbf{u}bold_u is chosen as

𝐮=§§𝐮§§\mathbf{u}=\mathbfcal{M}(\mathbf{x}-\mathbf{F}(\mathbf{x},\mathbf{u}))\,,bold_u = roman_ℳ ⇐ § ↖ roman_ℱ ⇐ § ⇔ ⊓ ⇒ ⇒ ⇔ (6)

for a causal operator ¬\¬superscript\superscript\mathbfcal{M}:\ell^{n}\rightarrow\ell^{m}roman_ℳ ¬ roman_ℓ start_POSTSUPERSCRIPT \ end_POSTSUPERSCRIPT → roman_ℓ start_POSTSUPERSCRIPT ⇕ end_POSTSUPERSCRIPT. Let 𝐊𝐊\mathbf{K}bold_K be the operator such that 𝐮=𝐊(𝐱)𝐮𝐊𝐱\mathbf{u}=\mathbf{K}(\mathbf{x})bold_u = bold_K ( bold_x ) is equivalent to (6).444This operator always exists because 𝐅(𝐱,𝐮)𝐅𝐱𝐮\mathbf{F}(\mathbf{x},\mathbf{u})bold_F ( bold_x , bold_u ) is strictly causal. Hence utsubscript𝑢𝑡u_{t}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT depends on the inputs ut1:0subscript𝑢:𝑡10u_{t-1:0}italic_u start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT and can be computed recursively from past inputs and xt:0subscript𝑥:𝑡0x_{t:0}italic_x start_POSTSUBSCRIPT italic_t : 0 end_POSTSUBSCRIPT — see formula (11). The following two statements hold true.

  1. 1.

    If subscript\mathbfcal{M}\in\mathcal{L}_{p}roman_ℳ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT, then the closed-loop system is psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-stable.

  2. 2.

    If there is a causal policy \mathbb{C}blackboard_C such that 𝚽𝐱[𝐅,],𝚽𝐮[𝐅,]psuperscript𝚽𝐱𝐅superscript𝚽𝐮𝐅subscript𝑝\bm{\Phi}^{\mathbf{x}}[\mathbf{\mathbf{F}},\mathbf{\mathbb{C}}],~{}\bm{\Phi}^{% \mathbf{u}}[\mathbf{\mathbf{F}},\mathbf{\mathbb{C}}]\in\mathcal{L}_{p}bold_Φ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT [ bold_F , blackboard_C ] , bold_Φ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT [ bold_F , blackboard_C ] ∈ caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, then

    Φ𝒞superscriptΦ𝒞\mathbfcal{M}=\bm{\Phi}^{\mathbf{u}}[\mathbf{\mathbf{F}},\mathbf{\mathbb{C}}]\,,roman_ℳ roman_ℑ bold_symbol_Φ start_POSTSUPERSCRIPT ⊓ end_POSTSUPERSCRIPT ∪ roman_ℱ ⇔ roman_𝒞 ⊌ ⇔ (7)

    gives 𝕂=𝕂\mathbb{K}=\mathbb{C}blackboard_K = blackboard_C.

Proof.

We prove 1)1)1 ). For compactness, define 𝐰^=𝐱𝐅(𝐱,𝐮)^𝐰𝐱𝐅𝐱𝐮\widehat{\mathbf{w}}=\mathbf{x}-\mathbf{F}(\mathbf{x},\mathbf{u})over^ start_ARG bold_w end_ARG = bold_x - bold_F ( bold_x , bold_u ). As highlighted in [25], since there is no model mismatch between the plant \mathbfcal{F}roman_ℱ and the model 𝐅𝐅\mathbf{F}bold_F used to define 𝐰^^𝐰\widehat{\mathbf{w}}over^ start_ARG bold_w end_ARG, one has 𝕨^=𝕨^𝕨𝕨\widehat{\mathbf{\mathbb{w}}}=\mathbb{w}over^ start_ARG blackboard_w end_ARG = blackboard_w, hence opening the loop. More specifically, from Figure 1 and Equation (2) one has

𝕨^=𝔽(𝕩,𝕦)+𝔽(𝕩,𝕦)+𝕨=𝕨.^𝕨𝔽𝕩𝕦𝔽𝕩𝕦𝕨𝕨\widehat{\mathbf{\mathbb{w}}}=-\mathbb{F}(\mathbb{x},\mathbb{u})+\mathbb{F}(% \mathbb{x},\mathbb{u})+\mathbb{w}=\mathbb{w}\,.over^ start_ARG blackboard_w end_ARG = - blackboard_F ( blackboard_x , blackboard_u ) + blackboard_F ( blackboard_x , blackboard_u ) + blackboard_w = blackboard_w . (8)

Therefore, by definition of the closed-loop maps, one has 𝚽𝐮[𝐅,𝕂]=superscript𝚽𝐮𝐅𝕂\bm{\Phi}^{\mathbf{u}}[\mathbf{\mathbf{F}},\mathbf{\mathbb{K}}]=\mathbfcal{M}bold_Φ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT [ bold_F , blackboard_K ] = roman_ℳ and 𝚽𝐱[𝐅,𝕂](𝕨)=𝔽(𝕩,\bm{\Phi}^{\mathbf{x}}[\mathbf{\mathbf{F}},\mathbf{\mathbb{K}}](\mathbb{w})=% \mathbb{F}(\mathbb{x},\mathbfcal{M}(\mathbb{w}))+\mathbb{w}bold_Φ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT [ bold_F , blackboard_K ] ( blackboard_w ) = blackboard_F ( blackboard_x , roman_ℳ ⇐ ⊒ ⇒ ⇒ ⇓ ⊒, 𝕨pfor-all𝕨subscript𝑝\forall\mathbb{w}\in\ell_{p}∀ blackboard_w ∈ roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. When 𝕨p𝕨subscript𝑝\mathbb{w}\in\ell_{p}blackboard_w ∈ roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, one has 𝚽𝐮[𝐅,𝕂](𝕨)psuperscript𝚽𝐮𝐅𝕂𝕨subscript𝑝\bm{\Phi}^{\mathbf{u}}[\mathbf{\mathbf{F}},\mathbf{\mathbb{K}}](\mathbb{w})\in% \ell_{p}bold_Φ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT [ bold_F , blackboard_K ] ( blackboard_w ) ∈ roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT because subscript\mathbfcal{M}\in\mathcal{L}_{p}roman_ℳ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT. Moreover subscript\mathbfcal{M}\in\mathcal{L}_{p}roman_ℳ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT and subscript\mathbfcal{F}\in\mathcal{L}_{p}roman_ℱ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT imply that the operator 𝕨𝕩maps-to𝕨𝕩\mathbb{w}\mapsto\mathbb{x}blackboard_w ↦ blackboard_x defined by the composition of the operators 𝕨(\mathbb{w}\mapsto(\mathbfcal{M}(\mathbb{w}),\mathbb{w})blackboard_w ↦ ( roman_ℳ ⇐ ⊒ ⇒ ⇔ ⊒ ⇒ and \mathbfcal{F}roman_ℱ is in psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT as well. This is due to the property that the composition of operators in psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is in psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT.

We prove 2)2)2 ). Set, for short, 𝚿𝐱=𝚽𝐱[𝐅,]superscript𝚿𝐱superscript𝚽𝐱𝐅\bm{\Psi}^{\mathbf{x}}=\bm{\Phi}^{\mathbf{x}}[\mathbf{\mathbf{F}},\mathbf{% \mathbb{C}}]bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT = bold_Φ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT [ bold_F , blackboard_C ], 𝚿𝐮=𝚽𝐮[𝐅,]superscript𝚿𝐮superscript𝚽𝐮𝐅\bm{\Psi}^{\mathbf{u}}=\bm{\Phi}^{\mathbf{u}}[\mathbf{\mathbf{F}},\mathbf{% \mathbb{C}}]bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT = bold_Φ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT [ bold_F , blackboard_C ], 𝚼𝐱=𝚽𝐱[𝐅,𝕂]superscript𝚼𝐱superscript𝚽𝐱𝐅𝕂\bm{\Upsilon}^{\mathbf{x}}=\bm{\Phi}^{\mathbf{x}}[\mathbf{\mathbf{F}},\mathbf{% \mathbb{K}}]bold_Υ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT = bold_Φ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT [ bold_F , blackboard_K ], and 𝚼𝐮=𝚽𝐮[𝐅,𝕂]superscript𝚼𝐮superscript𝚽𝐮𝐅𝕂\bm{\Upsilon}^{\mathbf{u}}=\bm{\Phi}^{\mathbf{u}}[\mathbf{\mathbf{F}},\mathbf{% \mathbb{K}}]bold_Υ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT = bold_Φ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT [ bold_F , blackboard_K ]. By assumption, one has ΨsuperscriptΨ\mathbfcal{M}=\bm{\Psi}^{\mathbf{u}}roman_ℳ roman_ℑ bold_symbol_Ψ start_POSTSUPERSCRIPT ⊓ end_POSTSUPERSCRIPT and since 𝚿𝐮psuperscript𝚿𝐮subscript𝑝\bm{\Psi}^{\mathbf{u}}\in{\mathcal{L}}_{p}bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ∈ caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT also subscript\mathbfcal{M}\in{\mathcal{L}}_{p}roman_ℳ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT. By definition, 𝚼𝐮superscript𝚼𝐮\bm{\Upsilon}^{\mathbf{u}}bold_Υ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT is the operator 𝕨𝕦maps-to𝕨𝕦\mathbb{w}\mapsto\mathbb{u}blackboard_w ↦ blackboard_u and, from (8) and Figure 1, it coincides with \mathbfcal{M}roman_ℳ. Hence

𝚿𝐮=𝚼𝐮.superscript𝚿𝐮superscript𝚼𝐮\bm{\Psi}^{\mathbf{u}}=\bm{\Upsilon}^{\mathbf{u}}\,.bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT = bold_Υ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT . (9)

It remains to prove that 𝚼𝐱=𝚿𝐱superscript𝚼𝐱superscript𝚿𝐱\bm{\Upsilon}^{\mathbf{x}}=\bm{\Psi}^{\mathbf{x}}bold_Υ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT = bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT. Similar to [22], we proceed by induction. First, we show that Ψ0x=Υ0xsubscriptsuperscriptΨ𝑥0subscriptsuperscriptΥ𝑥0\Psi^{x}_{0}=\Upsilon^{x}_{0}roman_Ψ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_Υ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Since f0=0subscript𝑓00f_{0}=0italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0 and w0=x0subscript𝑤0subscript𝑥0w_{0}=x_{0}italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, one has from (1) that the closed-loop map w0x0maps-tosubscript𝑤0subscript𝑥0w_{0}\mapsto x_{0}italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ↦ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the identity, irrespectively of the controller. Therefore Υ0x=Ψ0x=IsuperscriptsubscriptΥ0𝑥superscriptsubscriptΨ0𝑥𝐼\Upsilon_{0}^{x}=\Psi_{0}^{x}=Iroman_Υ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = roman_Ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = italic_I. Assume now that, for a positive j𝑗j\in\mathbb{N}italic_j ∈ blackboard_N we have Υix=ΨixsubscriptsuperscriptΥ𝑥𝑖subscriptsuperscriptΨ𝑥𝑖\Upsilon^{x}_{i}=\Psi^{x}_{i}roman_Υ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_Ψ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for all 0ij0𝑖𝑗0\leq i\leq j0 ≤ italic_i ≤ italic_j. Since (𝚼𝐱,𝚼𝐮)superscript𝚼𝐱superscript𝚼𝐮(\bm{\Upsilon}^{\mathbf{x}},\bm{\Upsilon}^{\mathbf{u}})( bold_Υ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Υ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) and (𝚿𝐱,𝚿𝐮)superscript𝚿𝐱superscript𝚿𝐮(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) are closed-loop maps, from (2) they verify

Υj+1x=Fj+1(Υj:0x,Υj:0u)+I,Ψj+1x=Fj+1(Ψj:0x,Ψj:0u)+I.formulae-sequencesubscriptsuperscriptΥ𝑥𝑗1subscript𝐹𝑗1subscriptsuperscriptΥ𝑥:𝑗0subscriptsuperscriptΥ𝑢:𝑗0𝐼subscriptsuperscriptΨ𝑥𝑗1subscript𝐹𝑗1subscriptsuperscriptΨ𝑥:𝑗0subscriptsuperscriptΨ𝑢:𝑗0𝐼\Upsilon^{x}_{j+1}=F_{j+1}(\Upsilon^{x}_{j:0},\Upsilon^{u}_{j:0})+I,\Psi^{x}_{% j+1}=F_{j+1}(\Psi^{x}_{j:0},\Psi^{u}_{j:0})+I.roman_Υ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ( roman_Υ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j : 0 end_POSTSUBSCRIPT , roman_Υ start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j : 0 end_POSTSUBSCRIPT ) + italic_I , roman_Ψ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ( roman_Ψ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j : 0 end_POSTSUBSCRIPT , roman_Ψ start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j : 0 end_POSTSUBSCRIPT ) + italic_I . (10)

But, from (9), one has Ψj:0u=Υj:0usubscriptsuperscriptΨ𝑢:𝑗0subscriptsuperscriptΥ𝑢:𝑗0\Psi^{u}_{j:0}=\Upsilon^{u}_{j:0}roman_Ψ start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j : 0 end_POSTSUBSCRIPT = roman_Υ start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j : 0 end_POSTSUBSCRIPT and, by using the inductive assumption, one obtains Υj+1x=Ψj+1xsubscriptsuperscriptΥ𝑥𝑗1subscriptsuperscriptΨ𝑥𝑗1\Upsilon^{x}_{j+1}=\Psi^{x}_{j+1}roman_Υ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT = roman_Ψ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT. This implies 𝐊=𝐂𝐊𝐂\mathbf{K=C}bold_K = bold_C.

Several comments are in order. First, Theorem 1 is about nominal stability only as there is no model mismatch between the plant model and the one used in the controller. We analyze robust stability in Section IV. Second, it is well known that many IMC architectures are sufficient for preserving stability, both in the linear [24] and the nonlinear [25] case.555Note, however, that IMC in [25] is developed in terms of continuous-time nonlinear input-output models, for which the effect of process noise is difficult to analyze. Moreover, the control objective is to track a reference signal to the plant output, which raises the problem of approximating inverses of nonlinear operators. In our work, we use instead discrete-time input-to-state models and analyze the closed-loop maps from process noise to control inputs and system states. Moreover, our goal is to solve optimal control rather than tracking problems. It is also known that in the LTI setting, IMC is also necessary for preserving stability [34] and provides an alternative to the Youla-Koucera parametrization [35]. In this respect, Theorem 1 provides a necessary condition for preserving stability also for nonlinear systems. This result is perhaps not surprising given that necessary and sufficient conditions for stabilizing wide classes of input-output nonlinear models, in the spirit of the Youla- Koucera parametrization, have been derived since the 80’s [27]. However, these controllers are not conceived in the IMC form.

Following [24, 25], we argue that the IMC structure facilitates the design of performance-boosting policies. Indeed, it is straightforward to deploy controllers using the block-diagram structure shown in Figure 1. In equation form, for a chosen operator \mathbfcal{M}roman_ℳ, one simply computes the control input as follows:

w^t=xtft(xt1,ut1),subscript^𝑤𝑡subscript𝑥𝑡subscript𝑓𝑡subscript𝑥𝑡1subscript𝑢𝑡1\displaystyle\widehat{w}_{t}=x_{t}-f_{t}(x_{t-1},u_{t-1})\,,over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) , (11a)
ut=t(w^t:0).subscript𝑢𝑡subscript𝑡subscript^𝑤:𝑡0\displaystyle u_{t}=\mathcal{M}_{t}(\widehat{w}_{t:0})\,.italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = caligraphic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t : 0 end_POSTSUBSCRIPT ) . (11b)

Moreover, Theorem 1 highlights that it is sufficient to search in the space of operators subscript\mathbfcal{M}\in{\mathcal{L}}_{p}roman_ℳ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT for describing all and only performance-boosting policies. While finding a parametrization of all operators subscript\mathbfcal{M}\in{\mathcal{L}}_{p}roman_ℳ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT might be prohibitive, we will show in Section V that one can use NNs for describing broad subsets of these operators. Moreover, the IMC structure lends itself to the development of policies that enjoy a distributed structure (see Section IV).

III-1 The case of LTI systems with nonlinear costs

Consider the linear system (3) and let z𝑧zitalic_z denote the time-shift operator. When the system is asymptotically stable, the classical Youla parametrization [35] states that all linear state-feedback stabilizing control policies 𝐮=𝐊𝐱𝐮𝐊𝐱\mathbf{u}=\mathbf{K}\mathbf{x}bold_u = bold_Kx can be written as

𝐮=𝐐(z)𝐱𝐐(z)z(A𝐱+B𝐮)𝐐(z)𝒯s,formulae-sequence𝐮𝐐𝑧𝐱𝐐𝑧𝑧𝐴𝐱𝐵𝐮𝐐𝑧𝒯subscript𝑠\mathbf{u}=\mathbf{Q}(z)\mathbf{x}-\frac{\mathbf{Q}(z)}{z}\left(A\mathbf{x}+B% \mathbf{u}\right)\quad\mathbf{Q}(z)\in\mathcal{TF}_{s}\,,bold_u = bold_Q ( italic_z ) bold_x - divide start_ARG bold_Q ( italic_z ) end_ARG start_ARG italic_z end_ARG ( italic_A bold_x + italic_B bold_u ) bold_Q ( italic_z ) ∈ caligraphic_T caligraphic_F start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , (12)

where 𝐐(z)𝐐𝑧\mathbf{Q}(z)bold_Q ( italic_z ) is the so-called Youla parameter. Here, 𝒯s𝒯subscript𝑠\mathcal{TF}_{s}caligraphic_T caligraphic_F start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT denotes the set of stable transfer matrices — that is, the set of matrices whose scalar entries are stable transfer functions. The class of linear control policies is globally optimal for standard LQG problems, and it allows optimizing over 𝐐𝒯s𝐐𝒯subscript𝑠\mathbf{Q}\in\mathcal{TF}_{s}bold_Q ∈ caligraphic_T caligraphic_F start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT using simple pole approximations and convex programming — we refer to [36, 37] for state-of-the-art results. However, nonlinear policies can be significantly more performing when the controller is distributed [38], or the cost function is nonlinear. As an immediate corollary of Theorem 1, and in accordance with the core contribution of [39], we have the following result for linear systems controlled by nonlinear policies.

Corollary 1.

Consider the linear system (3) and assume that it is asymptotically stable. Then, all and only control policies that make the closed-loop system psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-stable are expressed as

𝐮=§𝒜§𝐮§𝒜§\mathbf{u}=\mathbfcal{M}\left(\mathbf{x}-\frac{\left(A\mathbf{x}+B\mathbf{u}% \right)}{z}\right)\,,bold_u = roman_ℳ ⇐ § ↖ divide start_ARG ⇐ roman_𝒜 § ⇓ roman_ℬ ⊓ ⇒ end_ARG start_ARG ‡ end_ARG ⇒ ⇔ (13)

where subscript\mathbfcal{M}\in\mathcal{L}_{p}roman_ℳ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT.

Proof.

The proof follows from Theorem 1 upon realizing that the asymptotic stability of system (3) implies that the corresponding operator \mathbfcal{F}roman_ℱ is in psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, for any p1𝑝1p\geq 1italic_p ≥ 1. ∎

In conclusion, as expected, the linear Youla parametrization (12) is a special case of the proposed parametrization (13) with 𝒬𝒬\mathbfcal{M}=\mathbf{Q}roman_ℳ roman_ℑ roman_𝒬 and 𝐐𝒯s𝐐𝒯subscript𝑠\mathbf{Q}\in\mathcal{TF}_{s}bold_Q ∈ caligraphic_T caligraphic_F start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT.

III-2 Relationships with [22] and nonlinear SLS

In [22], we provided a slight generalization of Theorem  1 and the results in Section III-1 by also considering unstable systems 𝐱=𝐅~(𝐱,𝐮)+𝐰𝐱~𝐅𝐱𝐮𝐰\mathbf{x}=\tilde{\mathbf{F}}(\mathbf{x},\mathbf{u})+\mathbf{w}bold_x = over~ start_ARG bold_F end_ARG ( bold_x , bold_u ) + bold_w for which a pre-stabilizing controller 𝐊superscript𝐊\mathbf{K}^{\prime}bold_K start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT exists, so that the overall policy is

𝐮=𝐊(𝐱)+^𝐮superscript𝐊𝐱^\displaystyle\mathbf{u}=\mathbf{K}^{\prime}(\mathbf{x})+\mathbfcal{M}(\mathbf{% \widehat{\mathbf{w}}})\,.bold_u = bold_K start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_x ) + roman_ℳ ⇐ over^ start_ARG ⊒ end_ARG ⇒ ↙ (14)

By letting 𝐅(𝐱,𝐮)=𝐅(𝐱,𝐊(𝐱)+𝐮)𝐅𝐱𝐮𝐅𝐱superscript𝐊𝐱𝐮\mathbf{F}(\mathbf{x},\mathbf{u})=\mathbf{F}(\mathbf{x},\mathbf{K}^{\prime}(% \mathbf{x})+\mathbf{u})bold_F ( bold_x , bold_u ) = bold_F ( bold_x , bold_K start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_x ) + bold_u ), and assuming that both \mathbfcal{F}roman_ℱ and 𝐊superscript𝐊\mathbf{K}^{\prime}bold_K start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT lie in psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, Theorem 1 coincides with Theorem 2 in [22]. However, when 𝐊psuperscript𝐊subscript𝑝\mathbf{K}^{\prime}\not\in\mathcal{L}_{p}bold_K start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∉ caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, Theorem 2 in [22] highlights that subscript\mathbfcal{M}\in\mathcal{L}_{p}roman_ℳ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT may no longer be a necessary condition for closed-loop psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-stability, while being still sufficient.

Moreover, as highlighted in [22], there is a deep link between Theorem 1 and the SLS parametrization of stabilizing controllers [40, 29]. The idea behind the SLS approach [40, 29] is to circumvent the difficulty of characterizing stabilizing controllers, by instead directly designing stable closed-loop maps. Let us define the set of all achievable closed-loop maps for system 𝐅𝐅\mathbf{F}bold_F as

𝒞[𝐅]={(𝚽𝐱[𝐅,𝐊],𝚽𝐮[𝐅,𝐊])|𝐊 is causal},𝒞delimited-[]𝐅conditional-setsuperscript𝚽𝐱𝐅𝐊superscript𝚽𝐮𝐅𝐊𝐊 is causal\mathcal{CL}[\mathbf{F}]=\{(\bm{\Phi}^{\mathbf{x}}[\mathbf{F},\mathbf{K}],\bm{% \Phi}^{\mathbf{u}}[\mathbf{F},\mathbf{K}])~{}|~{}\mathbf{K}\text{ is causal}\}\,,caligraphic_C caligraphic_L [ bold_F ] = { ( bold_Φ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT [ bold_F , bold_K ] , bold_Φ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT [ bold_F , bold_K ] ) | bold_K is causal } , (15)

and the set of all achievable and stable closed-loop maps as

𝒞p[𝐅]={(𝚿𝐱,𝚿𝐮)𝒞[𝐅]|(𝚿𝐱,𝚿𝐮)p}.𝒞subscript𝑝delimited-[]𝐅conditional-setsuperscript𝚿𝐱superscript𝚿𝐮𝒞delimited-[]𝐅superscript𝚿𝐱superscript𝚿𝐮subscript𝑝\mathcal{CL}_{p}[\mathbf{F}]=\{(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})% \in\mathcal{CL}[\mathbf{F}]~{}|~{}(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u% }})\in\mathcal{L}_{p}\}\,.caligraphic_C caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_F ] = { ( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) ∈ caligraphic_C caligraphic_L [ bold_F ] | ( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) ∈ caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT } . (16)

Note that, if (𝚿𝐱,𝚿𝐮)𝒞p[𝐅]superscript𝚿𝐱superscript𝚿𝐮𝒞subscript𝑝delimited-[]𝐅(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})\in\mathcal{CL}_{p}[\mathbf{F}]( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) ∈ caligraphic_C caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_F ], then 𝐱=𝚿𝐱(𝐰)pn𝐱superscript𝚿𝐱𝐰superscriptsubscript𝑝𝑛\mathbf{x}=\bm{\Psi}^{\mathbf{x}}(\mathbf{w})\in\ell_{p}^{n}bold_x = bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT ( bold_w ) ∈ roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝐮=𝚿𝐮(𝐰)pm𝐮superscript𝚿𝐮𝐰superscriptsubscript𝑝𝑚\mathbf{u}=\bm{\Psi}^{\mathbf{u}}(\mathbf{w})\in\ell_{p}^{m}bold_u = bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ( bold_w ) ∈ roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT for all 𝐰pn𝐰superscriptsubscript𝑝𝑛\mathbf{w}\in\ell_{p}^{n}bold_w ∈ roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Based on Theorem III.3 of [29], and adding the requirement that the closed-loop maps must belong to psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, we summarize the main SLS result for nonlinear discrete-time systems.

Theorem 2 (Nonlinear SLS parametrization [29]).

The following two statements hold true.

  1. 1.

    The set 𝒞p[𝐅]𝒞subscript𝑝delimited-[]𝐅\mathcal{CL}_{p}[\mathbf{F}]caligraphic_C caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_F ] of all achievable and stable closed-loop responses admits the following characterization:

    𝒞p[𝐅]={\displaystyle\mathcal{CL}_{p}[\mathbf{F}]=\{caligraphic_C caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_F ] = { (𝚿𝐱,𝚿𝐮)|(𝚿𝐱,𝚿𝐮) are causal,conditionalsuperscript𝚿𝐱superscript𝚿𝐮superscript𝚿𝐱superscript𝚿𝐮 are causal\displaystyle(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})|~{}~{}(\bm{\Psi}^% {\mathbf{x}},\bm{\Psi}^{\mathbf{u}})\text{ are causal}\,,( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) | ( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) are causal , (17a)
    𝚿𝐱=𝐅(𝚿𝐱,𝚿𝐮)+𝐈,superscript𝚿𝐱𝐅superscript𝚿𝐱superscript𝚿𝐮𝐈\displaystyle\bm{\Psi}^{\mathbf{x}}=\mathbf{F}(\bm{\Psi}^{\mathbf{x}},\bm{\Psi% }^{\mathbf{u}})+\mathbf{I}\,,bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT = bold_F ( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) + bold_I , (17b)
    (𝚿𝐱,𝚿𝐮)p}.\displaystyle(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})\in\mathcal{L}_{p}% \}\,.( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) ∈ caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT } . (17c)
  2. 2.

    For any (𝚿𝐱,𝚿𝐮)𝒞p[𝐅]superscript𝚿𝐱superscript𝚿𝐮𝒞subscript𝑝delimited-[]𝐅(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})\in\mathcal{CL}_{p}[\mathbf{F}]( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) ∈ caligraphic_C caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_F ], the operator 𝚿𝐱superscript𝚿𝐱\bm{\Psi}^{\mathbf{x}}bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT is invertible and the causal controller

    𝐮=𝐊(𝐱)=𝚿𝐮((𝚿𝐱)1(𝐱)),𝐮𝐊𝐱superscript𝚿𝐮superscriptsuperscript𝚿𝐱1𝐱\mathbf{u}=\mathbf{K}(\mathbf{x})=\bm{\Psi}^{\mathbf{u}}\left((\bm{\Psi}^{% \mathbf{x}})^{-1}(\mathbf{x})\right)\,,bold_u = bold_K ( bold_x ) = bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ( ( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_x ) ) , (18)

    is the only one that achieves the stable closed-loop responses (𝚿𝐱,𝚿𝐮)superscript𝚿𝐱superscript𝚿𝐮(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ).

Theorem 2 clarifies that any policy 𝐊(𝐱)𝐊𝐱\mathbf{K}(\mathbf{x})bold_K ( bold_x ) achieving psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-stable closed-loop maps can be described in terms of two causal operators (𝚿𝐱,𝚿𝐮)psuperscript𝚿𝐱superscript𝚿𝐮subscript𝑝(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})\in\mathcal{L}_{p}( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) ∈ caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT complying with the nonlinear functional equality (17b). Therefore, the NOC problem admits an equivalent Nonlinear SLS (N-SLS) formulation:

NSLS::NSLSabsent\displaystyle\operatorname{N-SLS:}~{}~{}start_OPFUNCTION roman_N - roman_SLS : end_OPFUNCTION min(𝚿𝐱,𝚿𝐮)subscriptsuperscript𝚿𝐱superscript𝚿𝐮\displaystyle\min_{(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})}roman_min start_POSTSUBSCRIPT ( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT 𝔼wT:0[L(xT:0,uT:0)]subscript𝔼subscript𝑤:𝑇0delimited-[]𝐿subscript𝑥:𝑇0subscript𝑢:𝑇0\displaystyle\quad\mathbb{E}_{w_{T:0}}\left[L(x_{T:0},u_{T:0})\right]blackboard_E start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_L ( italic_x start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT ) ] (\star)
s.t.\displaystyle~{}~{}~{}\operatorname{s.t.}~{}~{}start_OPFUNCTION roman_s . roman_t . end_OPFUNCTION xt=Ψtx(wt:0),ut=Ψtu(wt:0),formulae-sequencesubscript𝑥𝑡subscriptsuperscriptΨ𝑥𝑡subscript𝑤:𝑡0subscript𝑢𝑡subscriptsuperscriptΨ𝑢𝑡subscript𝑤:𝑡0\displaystyle\quad x_{t}=\Psi^{x}_{t}(w_{t:0})\,,~{}~{}~{}u_{t}=\Psi^{u}_{t}(w% _{t:0})\,,italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Ψ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_t : 0 end_POSTSUBSCRIPT ) , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Ψ start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_t : 0 end_POSTSUBSCRIPT ) ,
(𝚿𝐱,𝚿𝐮)𝒞p[𝐅],t=0,1,formulae-sequencesuperscript𝚿𝐱superscript𝚿𝐮𝒞subscript𝑝delimited-[]𝐅𝑡01\displaystyle\quad(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})\in\mathcal{% CL}_{p}[\mathbf{F}]\,,t=0,1,\ldots( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) ∈ caligraphic_C caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_F ] , italic_t = 0 , 1 , …

According to Theorem 2, the constraint (𝚿𝐱,𝚿𝐮)𝒞p[𝐅]superscript𝚿𝐱superscript𝚿𝐮𝒞subscript𝑝delimited-[]𝐅(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})\in\mathcal{CL}_{p}[\mathbf{F}]( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) ∈ caligraphic_C caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_F ] is equivalent to requiring that (𝚿𝐱,𝚿𝐮)superscript𝚿𝐱superscript𝚿𝐮(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) are causal and verify (17b)-(17c). The constraint (17b) simply defines the operator 𝚿𝐱superscript𝚿𝐱\bm{\Psi}^{\mathbf{x}}bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT in terms of 𝚿𝐮superscript𝚿𝐮\bm{\Psi}^{\mathbf{u}}bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT and it can be computed explicitly because 𝐅𝐅\mathbf{F}bold_F is strictly causal. The main challenge is to comply with (17c). Indeed, it is hard to generate 𝚿𝐮psuperscript𝚿𝐮subscript𝑝\bm{\Psi}^{\mathbf{u}}\in\mathcal{L}_{p}bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ∈ caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT such that the corresponding 𝚿𝐱superscript𝚿𝐱\bm{\Psi}^{\mathbf{x}}bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT satisfies 𝚿𝐱psuperscript𝚿𝐱subscript𝑝\bm{\Psi}^{\mathbf{x}}\in\mathcal{L}_{p}bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT ∈ caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. The paper [29] suggests directly searching over psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-stable operators (𝚿𝐱,𝚿𝐮)superscript𝚿𝐱superscript𝚿𝐮(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) and abandoning the goal of complying with (17b) exactly. One can then study robust stability when (17b) only holds approximately as per Theorem IV.2 in [29]. However, with the exception of polynomial systems [41], this way of proceeding may result in conservative control policies or fail to produce a stabilizing controller. Instead, for the case of stable or pre-stabilized systems, Theorem 1 can be seen as a way of parametrizing all stabilizing controllers that circumvents completely the problem of fulfilling (17b)-(17c).

IV Beyond Closed-loop Stability: Handling Model Uncertainty and Distributed Architectures

This section tackles the performance boosting problem (Problem 1) under more intricate real-world constraints beyond just closed-loop stability. Firstly, Theorem 1 suffers from requiring perfect plant knowledge for controller design. In reality, ensuring closed-loop stability despite an imperfect model is crucial. Secondly, control policies in large-scale applications like power grids and traffic systems are inherently distributed. This means they rely solely on local sensor data and communication, posing significant challenges to achieving network-level robustness and stability.

IV-A Robustness against model-mismatch

Let us denote the nominal model available for design as 𝐅^(𝐱,𝐮)^𝐅𝐱𝐮{\widehat{\mathbf{F}}}(\mathbf{x},\mathbf{u})over^ start_ARG bold_F end_ARG ( bold_x , bold_u ) and the real unknown plant as

𝐅(𝐱,𝐮)=𝐅^(𝕩,𝕦)+𝚫(𝕩,𝕦),𝐅𝐱𝐮^𝐅𝕩𝕦𝚫𝕩𝕦\mathbf{F}(\mathbf{x},\mathbf{u})=\widehat{\mathbf{F}}(\mathbb{x},\mathbb{u})+% \bm{\Delta}(\mathbb{x},\mathbb{u})\,,bold_F ( bold_x , bold_u ) = over^ start_ARG bold_F end_ARG ( blackboard_x , blackboard_u ) + bold_Δ ( blackboard_x , blackboard_u ) , (19)

where 𝚫𝚫\bm{\Delta}bold_Δ is a strictly causal operator representing the model mismatch. Let δt(xt1:0,ut1:0)subscript𝛿𝑡subscript𝑥:𝑡10subscript𝑢:𝑡10\delta_{t}(x_{t-1:0},u_{t-1:0})italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT ) be the time representation of the mismatch operator 𝚫𝚫\mathbf{\Delta}bold_Δ. Since for each sequence of disturbances 𝐰n𝐰superscript𝑛\mathbf{w}\in\ell^{n}bold_w ∈ roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and inputs 𝐮m𝐮superscript𝑚\mathbf{u}\in\ell^{m}bold_u ∈ roman_ℓ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT the dynamics represented by (1) with ft(xt1:0,ut1:0)subscript𝑓𝑡subscript𝑥:𝑡10subscript𝑢:𝑡10f_{t}(x_{t-1:0},u_{t-1:0})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT ) replaced by f^t(xt1:0,ut1:0)+δt(xt1:0,ut1:0)subscript^𝑓𝑡subscript𝑥:𝑡10subscript𝑢:𝑡10subscript𝛿𝑡subscript𝑥:𝑡10subscript𝑢:𝑡10\widehat{f}_{t}(x_{t-1:0},u_{t-1:0})+\delta_{t}(x_{t-1:0},u_{t-1:0})over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT ) + italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT ) produces a unique state sequence 𝐱n𝐱superscript𝑛\mathbf{x}\in\ell^{n}bold_x ∈ roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, the equation

𝐱=𝐅(𝐱,𝐮)+𝐰,𝐱𝐅𝐱𝐮𝐰\mathbf{x}=\mathbf{F}(\mathbf{x},\mathbf{u})+\mathbf{w}\,,bold_x = bold_F ( bold_x , bold_u ) + bold_w , (20)

defines again a unique transition operator ¬§maps-to¬§\mathbfcal{F}:(\mathbf{u},\mathbf{w})\mapsto\mathbf{x}roman_ℱ ¬ ⇐ ⊓ ⇔ ⊒ ⇒ ↦ §, which provides an input-to-state model of the perturbed system.

Here, we show that when 𝚫𝚫\bm{\Delta}bold_Δ can be described by an psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT operator with finite gain, we can always design operators \mathbfcal{M}roman_ℳ with sufficiently small psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-gain that stabilize the real closed-loop system. More specifically, letting γ𝚫subscript𝛾𝚫\gamma_{\bm{\Delta}}italic_γ start_POSTSUBSCRIPT bold_Δ end_POSTSUBSCRIPT be the maximum psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT gain of the model mismatch 𝚫𝚫\bm{\Delta}bold_Δ, it is possible to design controllers 𝐊𝐊\mathbf{K}bold_K that comply with the following robust version of the stability constraint (5b):

(𝚽[𝔽^+𝚫,𝐊])p,{𝕩,𝕦},𝚫|γ(𝚫)γ𝚫.(\bm{\Phi}^{*}[\widehat{\mathbb{F}}+\bm{\Delta},\mathbf{K}])\in\mathcal{L}_{p}% \,,~{}*\in\{\mathbb{x},\mathbb{u}\}\,,~{}\forall\bm{\Delta}|~{}\gamma(\bm{% \Delta})\leq\gamma_{\bm{\Delta}}\,.( bold_Φ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT [ over^ start_ARG blackboard_F end_ARG + bold_Δ , bold_K ] ) ∈ caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , ∗ ∈ { blackboard_x , blackboard_u } , ∀ bold_Δ | italic_γ ( bold_Δ ) ≤ italic_γ start_POSTSUBSCRIPT bold_Δ end_POSTSUBSCRIPT . (21)

This result, which is given in the next theorem, refers to the control scheme in Figure 2.

Refer to caption
Figure 2: The closed-loop system when the nominal model 𝐅(𝐱,𝐮)𝐅𝐱𝐮{\mathbf{F}}(\mathbf{x},\mathbf{u})bold_F ( bold_x , bold_u ) used in the IMC controller and the real plant 𝐅(𝐱,𝐮)=𝐅(𝐱,𝐮)+𝚫(𝐱,𝐮)𝐅𝐱𝐮𝐅𝐱𝐮𝚫𝐱𝐮\mathbf{F}(\mathbf{x},\mathbf{u})={\mathbf{F}}(\mathbf{x},\mathbf{u})+{\mathbf% {\Delta}}(\mathbf{x},\mathbf{u})bold_F ( bold_x , bold_u ) = bold_F ( bold_x , bold_u ) + bold_Δ ( bold_x , bold_u ) differ by the perturbation 𝚫p𝚫subscript𝑝{\mathbf{\Delta}}\in\mathcal{L}_{p}bold_Δ ∈ caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. Compared to Figure 1 the blocks have been rearranged to highlight the subsystems used in the small-gain argument adopted in the proof of Theorem 3.
Theorem 3.

Assume that the mismatch operator 𝚫𝚫\bm{\Delta}bold_Δ in (19) has finite psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-gain γ(𝚫)𝛾𝚫\gamma(\mathbf{\Delta})italic_γ ( bold_Δ ). Furthermore, assume that the operator \mathbfcal{F}roman_ℱ has finite psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-gain γ()𝛾\gamma(\mathbf{\mathbfcal{F}})italic_γ ( roman_ℱ ). Then, for any \mathbfcal{M}roman_ℳ such that

γ(𝓜)<γ(𝚫)1(γ(\gamma(\bm{\mathcal{M}})<\gamma(\mathbf{\Delta})^{-1}(\gamma(\mathbfcal{F})+1)% ^{-1}\,,italic_γ ( bold_caligraphic_M ) < italic_γ ( bold_Δ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_γ ( roman_ℱ ⇒ ⇓ ∞ ⇒ start_POSTSUPERSCRIPT ↖ ∞ end_POSTSUPERSCRIPT ⇔ (22)

the control policy given by

w^t=xtf^t(xt1:0,ut1:0),subscript^𝑤𝑡subscript𝑥𝑡subscript^𝑓𝑡subscript𝑥:𝑡10subscript𝑢:𝑡10\displaystyle\widehat{w}_{t}=x_{t}-\widehat{f}_{t}(x_{t-1:0},u_{t-1:0})\,,over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT ) , (23a)
ut=t(w^t:0),subscript𝑢𝑡subscript𝑡subscript^𝑤:𝑡0\displaystyle u_{t}=\mathcal{M}_{t}(\widehat{w}_{t:0})\,,italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = caligraphic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t : 0 end_POSTSUBSCRIPT ) , (23b)

stabilizes the closed-loop system.

Proof.

We first show that operators 𝐅𝐅\mathbf{F}bold_F and \mathbfcal{F}roman_ℱ verify

𝐅(\mathbf{F}(\mathbfcal{F}(\mathbf{u},\mathbf{w}),\mathbf{u})=\mathbfcal{F}(% \mathbf{u},\mathbf{w})-\mathbf{w}\,.bold_F ( roman_ℱ ⇐ ⊓ ⇔ ⊒ ⇒ ⇔ ⊓ ⇒ roman_ℑ roman_ℱ ⇐ ⊓ ⇔ ⊒ ⇒ ↖ ⊒ ↙ (24)

This follows by substituting 𝐱=𝐱\mathbf{x}=\mathbfcal{F}(\mathbf{u},\mathbf{w})bold_x = roman_ℱ ⇐ ⊓ ⇔ ⊒ ⇒ in (20). We now compute the psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT gain of the operator Σ1:(𝐮,𝐰)𝐰^:subscriptΣ1maps-to𝐮𝐰^𝐰\Sigma_{1}:(\mathbf{u},\mathbf{w})\mapsto\widehat{\mathbf{w}}roman_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : ( bold_u , bold_w ) ↦ over^ start_ARG bold_w end_ARG in the right frame of Figure 2:

𝐰^^𝐰\displaystyle\widehat{\mathbf{w}}over^ start_ARG bold_w end_ARG =^absent^\displaystyle=\mathbfcal{F}(\mathbf{u},\mathbf{w})-\widehat{\mathbf{F}}(% \mathbfcal{F}(\mathbf{u},\mathbf{w}),\mathbf{u})== roman_ℱ ⇐ ⊓ ⇔ ⊒ ⇒ ↖ over^ start_ARG roman_ℱ end_ARG ⇐ roman_ℱ ⇐ ⊓ ⇔ ⊒ ⇒ ⇔ ⊓ ⇒ roman_ℑ
=𝐅(^\displaystyle=\mathbf{F}(\mathbfcal{F}(\mathbf{u},\mathbf{w}),\mathbf{u})-% \widehat{\mathbf{F}}(\mathbfcal{F}(\mathbf{u},\mathbf{w}),\mathbf{u})+\mathbf{w}= bold_F ( roman_ℱ ⇐ ⊓ ⇔ ⊒ ⇒ ⇔ ⊓ ⇒ ↖ over^ start_ARG roman_ℱ end_ARG ⇐ roman_ℱ ⇐ ⊓ ⇔ ⊒ ⇒ ⇔ ⊓ ⇒ ⇓ ⊒
=𝚫(\displaystyle=\mathbf{\Delta}(\mathbfcal{F}(\mathbf{u},\mathbf{w}),\mathbf{u})% +\mathbf{w}\,,= bold_Δ ( roman_ℱ ⇐ ⊓ ⇔ ⊒ ⇒ ⇔ ⊓ ⇒ ⇓ ⊒ ⇔ (25)

where the first equality follows from (24). Using the definition of psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-gain for the operator 𝐲=𝚫(𝐱,𝐮)𝐲𝚫𝐱𝐮\mathbf{y}=\mathbf{\Delta}(\mathbf{x},\mathbf{u})bold_y = bold_Δ ( bold_x , bold_u ) one has |𝐲|γ(𝚫)(|𝐱|+|𝐮|)𝐲𝛾𝚫𝐱𝐮|\mathbf{y}|\leq\gamma(\mathbf{\Delta})(|\mathbf{x}|+|\mathbf{u}|)| bold_y | ≤ italic_γ ( bold_Δ ) ( | bold_x | + | bold_u | ), and, by using (25) and 𝐮=^𝐮^\mathbf{u}=\mathbfcal{M}(\widehat{\mathbf{w}})bold_u = roman_ℳ ⇐ over^ start_ARG ⊒ end_ARG ⇒, one obtains

|𝐰^|γ(𝚫)(|\displaystyle|\widehat{\mathbf{w}}|\leq\gamma(\mathbf{\Delta})(|\mathbfcal{F}(% \mathbf{u},\mathbf{w})|+|\mathbf{u}|))+|\mathbf{w}|| over^ start_ARG bold_w end_ARG | ≤ italic_γ ( bold_Δ ) ( | roman_ℱ ⇐ ⊓ ⇔ ⊒ ⇒ ♣ ⇓ ♣ ⊓ ♣ ⇒ ⇒ ⇓ ♣ ⊒ ♣
γ(𝚫)(γ(γ\displaystyle\leq\gamma(\mathbf{\Delta})(\gamma(\mathbfcal{F})|\mathbf{w}|+% \gamma(\mathbfcal{F})|\mathbf{u}|+|\mathbf{u}|)+|\mathbf{w}|≤ italic_γ ( bold_Δ ) ( italic_γ ( roman_ℱ ⇒ ♣ ⊒ ♣ ⇓ italic_γ ⇐ roman_ℱ ⇒ ♣ ⊓ ♣ ⇓ ♣ ⊓ ♣ ⇒ ⇓ ♣ ⊒ ♣
(γ(𝚫)γ(γ𝚫γγ^\displaystyle\leq(\gamma(\mathbf{\Delta})\gamma(\mathbfcal{F})+1)|\mathbf{w}|+% \gamma(\mathbf{\Delta})(\gamma(\mathbfcal{F})+1)\gamma(\mathbfcal{M})|\widehat% {\mathbf{w}}|\,.≤ ( italic_γ ( bold_Δ ) italic_γ ( roman_ℱ ⇒ ⇓ ∞ ⇒ ♣ ⊒ ♣ ⇓ italic_γ ⇐ bold_Δ ⇒ ⇐ italic_γ ⇐ roman_ℱ ⇒ ⇓ ∞ ⇒ italic_γ ⇐ roman_ℳ ⇒ ♣ over^ start_ARG ⊒ end_ARG ♣ ↙

The relationship above implies that

|𝐰^|(γ(𝚫)γ(1γ(𝚫)γ(γ)|𝐰|.|\widehat{\mathbf{w}}|\leq\left(\frac{\gamma(\bm{\Delta})\gamma(\mathbfcal{F})% +1}{1-\gamma(\bm{\Delta})\gamma(\mathbfcal{M})\left(\gamma(\mathbfcal{F})+1% \right)}\right)|\mathbf{w}|\,.| over^ start_ARG bold_w end_ARG | ≤ ( divide start_ARG italic_γ ( bold_Δ ) italic_γ ( roman_ℱ ⇒ ⇓ ∞ end_ARG start_ARG 1 - italic_γ ( bold_Δ ) italic_γ ( roman_ℳ ⇒ ⇐ italic_γ ⇐ roman_ℱ ⇒ ⇓ ∞ ⇒ end_ARG ) | bold_w | . (26)

Next, we plug the upperbound (26) into the inequality |𝐮|γ(^|\mathbf{u}|\leq\gamma(\mathbfcal{M})|\widehat{\mathbf{w}}|| bold_u | ≤ italic_γ ( roman_ℳ ⇒ ♣ over^ start_ARG ⊒ end_ARG ♣ to obtain

|𝐮|(γ(γΔγ1γ(𝚫)γ(γ)|𝐰|,|\mathbf{u}|\leq\left(\frac{\gamma(\mathbfcal{M})\left(\gamma(\bm{\Delta})% \gamma(\mathbfcal{F})+1\right)}{1-\gamma(\bm{\Delta})\gamma(\mathbfcal{M})(% \gamma(\mathbfcal{F})+1)}\right)|\mathbf{w}|\,,| bold_u | ≤ ( divide start_ARG italic_γ ( roman_ℳ ⇒ ⇐ italic_γ ⇐ bold_symbol_Δ ⇒ italic_γ ⇐ roman_ℱ ⇒ ⇓ ∞ ⇒ end_ARG start_ARG 1 - italic_γ ( bold_Δ ) italic_γ ( roman_ℳ ⇒ ⇐ italic_γ ⇐ roman_ℱ ⇒ ⇓ ∞ ⇒ end_ARG ) | bold_w | , (27)

and subsequently, we plug (27) into the inequality |𝐱|γ(|\mathbf{x}|\leq\gamma(\mathbfcal{F})(|\mathbf{u}|+|\mathbf{w}|)| bold_x | ≤ italic_γ ( roman_ℱ ⇒ ⇐ ♣ ⊓ ♣ ⇓ ♣ ⊒ ♣ ⇒ to obtain

|𝐱|(γ(γγΔγΔγγ)|𝐰|.|\mathbf{x}|\leq\left(\gamma(\mathbfcal{F})\frac{1+\gamma(\mathbfcal{M})\left(% 1-\gamma(\bm{\Delta})\right)}{1-\gamma(\bm{\Delta})\gamma(\mathbfcal{M})(% \gamma(\mathbfcal{F})+1)}\right)|\mathbf{w}|\,.| bold_x | ≤ ( italic_γ ( roman_ℱ ⇒ divide start_ARG ∞ ⇓ italic_γ ⇐ roman_ℳ ⇒ ⇐ ∞ ↖ italic_γ ⇐ bold_symbol_Δ ⇒ ⇒ end_ARG start_ARG ∞ ↖ italic_γ ⇐ bold_symbol_Δ ⇒ italic_γ ⇐ roman_ℳ ⇒ ⇐ italic_γ ⇐ roman_ℱ ⇒ ⇓ ∞ ⇒ end_ARG ) | bold_w | . (28)

The last step is to verify that the maps 𝐰𝐱𝐰𝐱\mathbf{w}\rightarrow\mathbf{x}bold_w → bold_x and 𝐰𝐮𝐰𝐮\mathbf{w}\rightarrow\mathbf{u}bold_w → bold_u have a finite psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-gain. This is done by checking that the gains in (27) and (28) are positive values when the gain of \mathbfcal{M}roman_ℳ is sufficiently small. If (22) holds, we have that γ(𝚫)γ(γ\gamma(\bm{\Delta})\gamma(\mathbfcal{M})(\gamma(\mathbfcal{F})+1)<1italic_γ ( bold_Δ ) italic_γ ( roman_ℳ ⇒ ⇐ italic_γ ⇐ roman_ℱ ⇒ ⇓ ∞ ⇒ roman_ℜ ∞, and hence the denominator in (27) is positive. Since the numerator of (27) is always positive, we conclude that the map 𝐰𝐮𝐰𝐮\mathbf{w}\rightarrow\mathbf{u}bold_w → bold_u has an psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-gain. Similarly for (28), since (22) implies that γ(γΔ\gamma(\mathbfcal{M})\gamma(\bm{\Delta})<1italic_γ ( roman_ℳ ⇒ italic_γ ⇐ bold_symbol_Δ ⇒ roman_ℜ ∞, we have that both numerator and denominator are positive. This implies that the map 𝐰𝐱𝐰𝐱\mathbf{w}\rightarrow\mathbf{x}bold_w → bold_x has an psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-gain, as desired. ∎

The robustness condition (22) highlights a trade-off between (i𝑖iitalic_i) the degree of tolerable uncertainty in the mismatch between nominal and real dynamics, and (ii𝑖𝑖iiitalic_i italic_i) the extent of the set of stabilizing control policies that we are permitted to optimize over. Specifically, (22) ensures that, for any model mismatch 𝚫p𝚫subscript𝑝\bm{\Delta}\in\mathcal{L}_{p}bold_Δ ∈ caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, there always exists a range of admissible gains for \mathbfcal{M}roman_ℳ such that the closed-loop is stable. This enables one to freely learn over all appropriately gain-bounded operators. Further note that Theorem 3 is not conservative when 𝚫=0𝚫0\bm{\Delta}=0bold_Δ = 0 — this is unlike the classical application of the small-gain theorem [42] which would enforce that γ(𝐊)<(γ(\gamma(\mathbf{K})<(\gamma(\mathbfcal{F}))^{-1}italic_γ ( bold_K ) < ( italic_γ ( roman_ℱ ⇒ ⇒ start_POSTSUPERSCRIPT ↖ ∞ end_POSTSUPERSCRIPT even when 𝚫=0𝚫0\bm{\Delta}=0bold_Δ = 0. Indeed, when the model is fully known, the right-hand side of (22) diverges to infinity, allowing the gain of \mathbfcal{M}roman_ℳ to be any finite value, although without imposing an upper bound, and therefore recovering the completeness result of Theorem 1.

Remark 1 (Robust stability of nonlinear SLS).

The authors of [29] characterize robust stability of nonlinear SLS against mismatch in satisfying the achievability constraint (17b). Specifically, [29] focuses on the scenario where the control policy is the map** 𝐱𝐮𝐱𝐮\mathbf{x}\rightarrow\mathbf{u}bold_x → bold_u in the form

𝐰~~𝐰\displaystyle\tilde{\mathbf{w}}over~ start_ARG bold_w end_ARG =𝐱(𝚿𝐱𝐈)𝐰~,absent𝐱superscript𝚿𝐱𝐈~𝐰\displaystyle=\mathbf{x}-(\bm{\Psi}^{\mathbf{x}}-\mathbf{I})\tilde{\mathbf{w}}\,,= bold_x - ( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT - bold_I ) over~ start_ARG bold_w end_ARG , (29)
𝐮𝐮\displaystyle\mathbf{u}bold_u =𝚿𝐮(𝐰~),absentsuperscript𝚿𝐮~𝐰\displaystyle=\bm{\Psi}^{\mathbf{u}}(\tilde{\mathbf{w}})\,,= bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ( over~ start_ARG bold_w end_ARG ) , (30)

for some (𝚿𝐱,𝚿𝐮)psuperscript𝚿𝐱superscript𝚿𝐮subscript𝑝(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})\in\mathcal{L}_{p}( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) ∈ caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT which are not assumed to perfectly comply with (17b). Accordingly, the authors define a mismatch operator

𝚵=𝐅(𝚿𝐱,𝚿𝐮)+𝐈𝚿𝐱.𝚵𝐅superscript𝚿𝐱superscript𝚿𝐮𝐈superscript𝚿𝐱\bm{\Xi}=\mathbf{F}(\bm{\Psi}^{\mathbf{x}},\bm{\Psi}^{\mathbf{u}})+\mathbf{I}-% \bm{\Psi}^{\mathbf{x}}\,.bold_Ξ = bold_F ( bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT , bold_Ψ start_POSTSUPERSCRIPT bold_u end_POSTSUPERSCRIPT ) + bold_I - bold_Ψ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT . (31)

Then, Theorem IV.2 of [29] proves closed-loop stability as long as γ(𝚵)<1𝛾𝚵1\gamma\left(\bm{\Xi}\right)<1italic_γ ( bold_Ξ ) < 1. Since 𝚵𝚵\bm{\Xi}bold_Ξ measures the degree of violation of the achievability constraint rather than the degree of model uncertainty, a robust stability analysis based on verifying γ(𝚵)<1𝛾𝚵1\gamma(\bm{\Xi})<1italic_γ ( bold_Ξ ) < 1 tailored to the case 𝐅=𝐅^+𝚫𝐅^𝐅𝚫\mathbf{F}=\widehat{\mathbf{F}}+\bm{\Delta}bold_F = over^ start_ARG bold_F end_ARG + bold_Δ may not be straightforward, and it is not attempted in [29]. For this case, instead, Theorem 3 provides an upper bound on the admissible gains for \mathbfcal{M}roman_ℳ; this is achieved by exploiting the IMC structure of the policy (23), and bounding the effect of model uncertainty on the closed-loop map for the ground-truth system.

IV-B Distributed controllers for large-scale plants

When dealing with large-scale cyber-physical systems, one may consider that the plant (1) is composed of a network of N𝑁Nitalic_N dynamically interconnected nonlinear subsystems. To model this scenario, we introduce an undirected coupling graph 𝒢=(𝒱,)𝒢𝒱\mathcal{G}=(\mathcal{V},\mathcal{E})caligraphic_G = ( caligraphic_V , caligraphic_E ), where the nodes 𝒱={1,,N}𝒱1𝑁\mathcal{V}=\{1,\dots,N\}caligraphic_V = { 1 , … , italic_N } represent the subsystems in the network, and the set of edges \mathcal{E}caligraphic_E encode pairs of subsystems {i,j}𝑖𝑗\{i,j\}{ italic_i , italic_j } that are dynamically interconnected through state variables. Specifically, the dynamics of each subsystem i𝒱𝑖𝒱i\in\mathcal{V}italic_i ∈ caligraphic_V is

xt[i]=ft[i](xt1:0[𝒩i],ut1:0[i])+wt[i],t=1,2,formulae-sequencesuperscriptsubscript𝑥𝑡delimited-[]𝑖superscriptsubscript𝑓𝑡delimited-[]𝑖subscriptsuperscript𝑥delimited-[]subscript𝒩𝑖:𝑡10subscriptsuperscript𝑢delimited-[]𝑖:𝑡10subscriptsuperscript𝑤delimited-[]𝑖𝑡𝑡12x_{t}^{[i]}=f_{t}^{[i]}(x^{[\mathcal{N}_{i}]}_{t-1:0},u^{[i]}_{t-1:0})+w^{[i]}% _{t},\ \ \ t=1,2,\ldotsitalic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT = italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT [ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT ) + italic_w start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t = 1 , 2 , … (32)

where state and input of each subsystem i𝒱𝑖𝒱i\in\mathcal{V}italic_i ∈ caligraphic_V at time t=1,2,𝑡12t=1,2,\ldotsitalic_t = 1 , 2 , … are denoted by xt[i]nisuperscriptsubscript𝑥𝑡delimited-[]𝑖superscriptsubscript𝑛𝑖x_{t}^{[i]}\in\mathbb{R}^{n_{i}}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and ut[i]misuperscriptsubscript𝑢𝑡delimited-[]𝑖superscriptsubscript𝑚𝑖u_{t}^{[i]}\in\mathbb{R}^{m_{i}}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT respectively, and the initial state is x0[i]nisubscriptsuperscript𝑥delimited-[]𝑖0superscriptsubscript𝑛𝑖x^{[i]}_{0}\in\mathbb{R}^{n_{i}}italic_x start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. In operator form we have

𝐱[i]=𝐅[i](𝐱[𝒩i],𝐮[i])+𝐰[i],superscript𝐱delimited-[]𝑖superscript𝐅delimited-[]𝑖superscript𝐱delimited-[]subscript𝒩𝑖superscript𝐮delimited-[]𝑖superscript𝐰delimited-[]𝑖\mathbf{x}^{[i]}=\mathbf{F}^{[i]}(\mathbf{x}^{[\mathcal{N}_{i}]},\mathbf{u}^{[% i]})+\mathbf{w}^{[i]},bold_x start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT = bold_F start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT [ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUPERSCRIPT , bold_u start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) + bold_w start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT , (33)

where 𝐅[i]:n𝒩i×mini:superscript𝐅delimited-[]𝑖superscriptsubscript𝑛subscript𝒩𝑖superscriptsubscript𝑚𝑖superscriptsubscript𝑛𝑖\mathbf{F}^{[i]}:\ell^{n_{\mathcal{N}_{i}}}\times\ell^{m_{i}}\rightarrow\ell^{% n_{i}}bold_F start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT : roman_ℓ start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT × roman_ℓ start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → roman_ℓ start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Note that, by stacking the subsystem dynamics in (32) together, we recover a system in the form (1), where xt=coli𝒱(xt[i])nsubscript𝑥𝑡𝑐𝑜subscript𝑙𝑖𝒱superscriptsubscript𝑥𝑡delimited-[]𝑖superscript𝑛x_{t}=col_{i\in\mathcal{V}}(x_{t}^{[i]})\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_c italic_o italic_l start_POSTSUBSCRIPT italic_i ∈ caligraphic_V end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, ut=coli𝒱(ut[i])msubscript𝑢𝑡𝑐𝑜subscript𝑙𝑖𝒱superscriptsubscript𝑢𝑡delimited-[]𝑖superscript𝑚u_{t}=col_{i\in\mathcal{V}}(u_{t}^{[i]})\in\mathbb{R}^{m}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_c italic_o italic_l start_POSTSUBSCRIPT italic_i ∈ caligraphic_V end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, and wt=coli𝒱(wt[i])nsubscript𝑤𝑡𝑐𝑜subscript𝑙𝑖𝒱superscriptsubscript𝑤𝑡delimited-[]𝑖superscript𝑛w_{t}=col_{i\in\mathcal{V}}(w_{t}^{[i]})\in\mathbb{R}^{n}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_c italic_o italic_l start_POSTSUBSCRIPT italic_i ∈ caligraphic_V end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.

When controlling networked systems in the form (33), a common scenario is that the local feedback controller ut[i]superscriptsubscript𝑢𝑡delimited-[]𝑖u_{t}^{[i]}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT can only access information made available by its neighbors according to a communication network with the same topology of 𝒢𝒢\mathcal{G}caligraphic_G. This requirement translates into imposing the following additional constraint to the performance-boosting problem (Problem 1):

𝐮[i]=𝐊[i](𝐱[𝒩i]),i𝒱.formulae-sequencesuperscript𝐮delimited-[]𝑖superscript𝐊delimited-[]𝑖superscript𝐱delimited-[]subscript𝒩𝑖for-all𝑖𝒱\mathbf{u}^{[i]}=\mathbf{K}^{[i]}(\mathbf{x}^{[\mathcal{N}_{i}]}),\quad\forall i% \in\mathcal{V}\,.bold_u start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT = bold_K start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT [ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUPERSCRIPT ) , ∀ italic_i ∈ caligraphic_V . (34)

The challenge becomes to parametrize only those stabilizing policies that are distributed according to (34). This can be achieved by exploiting the IMC controller architecture (11) in combination with the network sparsity of 𝐅𝐅\mathbf{F}bold_F highlighted in (33). Let us consider, for example, the networked plant of Figure 3, where 𝐮[i]superscript𝐮delimited-[]𝑖\mathbf{u}^{[i]}bold_u start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT depends on the local disturbance reconstructions 𝐰^[i]superscript^𝐰delimited-[]𝑖\widehat{\mathbf{w}}^{[i]}over^ start_ARG bold_w end_ARG start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT only, that is, 𝐮[i]=^superscript𝐮delimited-[]𝑖superscriptsuperscript^\mathbf{u}^{[i]}=\mathbfcal{M}^{[i]}(\widehat{\mathbf{w}}^{[i]})bold_u start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT = roman_ℳ start_POSTSUPERSCRIPT ∪ ⟩ ⊌ end_POSTSUPERSCRIPT ⇐ over^ start_ARG ⊒ end_ARG start_POSTSUPERSCRIPT ∪ ⟩ ⊌ end_POSTSUPERSCRIPT ⇒. In order to reconstruct 𝐰^[1]superscript^𝐰delimited-[]1\widehat{\mathbf{w}}^{[1]}over^ start_ARG bold_w end_ARG start_POSTSUPERSCRIPT [ 1 ] end_POSTSUPERSCRIPT, agent i=1𝑖1i=1italic_i = 1 needs to evaluate the local dynamics 𝐅[1](𝐱[1],𝐱[3],𝐮[1])superscript𝐅delimited-[]1superscript𝐱delimited-[]1superscript𝐱delimited-[]3superscript𝐮delimited-[]1\mathbf{F}^{[1]}(\mathbf{x}^{[1]},\mathbf{x}^{[3]},\mathbf{u}^{[1]})bold_F start_POSTSUPERSCRIPT [ 1 ] end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT [ 1 ] end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT [ 3 ] end_POSTSUPERSCRIPT , bold_u start_POSTSUPERSCRIPT [ 1 ] end_POSTSUPERSCRIPT ); this, in turns, requires a measurement of the state 𝐱[3]superscript𝐱delimited-[]3\mathbf{x}^{[3]}bold_x start_POSTSUPERSCRIPT [ 3 ] end_POSTSUPERSCRIPT over time. Repeating this reasoning for the agents i=2𝑖2i=2italic_i = 2 and i=3𝑖3i=3italic_i = 3, one obtains an overall control policy 𝐊(𝐱)𝐊𝐱\mathbf{K}(\mathbf{x})bold_K ( bold_x ) whose agent-wise components are computed relying on measurements from neighboring subsystems only, thus complying with (34). We formalize this reasoning in the next proposition.

Proposition 1.

Let graph 𝒢=(𝒱,)𝒢𝒱\mathcal{G}=(\mathcal{V},\mathcal{E})caligraphic_G = ( caligraphic_V , caligraphic_E ) describe the topology of a plant 𝐅𝐅\mathbf{F}bold_F as per (33). Consider an IMC control policy (11) where the operator subscript\mathbfcal{M}\in\mathcal{L}_{p}roman_ℳ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT is decentralized, that is, ^^superscript^superscriptsuperscript^\mathbfcal{M}^{[i]}(\widehat{\mathbf{w}})=\mathbfcal{M}^{[i]}(\widehat{\mathbf% {w}}^{[i]})roman_ℳ start_POSTSUPERSCRIPT ∪ ⟩ ⊌ end_POSTSUPERSCRIPT ⇐ over^ start_ARG ⊒ end_ARG ⇒ roman_ℑ roman_ℳ start_POSTSUPERSCRIPT ∪ ⟩ ⊌ end_POSTSUPERSCRIPT ⇐ over^ start_ARG ⊒ end_ARG start_POSTSUPERSCRIPT ∪ ⟩ ⊌ end_POSTSUPERSCRIPT ⇒ for every agent i𝒱𝑖𝒱i\in\mathcal{V}italic_i ∈ caligraphic_V. Then, the closed-loop system is psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-stable and the corresponding control policy 𝐮=𝐊(𝐱)𝐮𝐊𝐱\mathbf{u}=\mathbf{K}(\mathbf{x})bold_u = bold_K ( bold_x ) is distributed according to (34).

Proof.

Since subscript\mathbfcal{M}\in\mathcal{L}_{p}roman_ℳ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT, the closed-loop system is psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-stable by Theorem 1. By (33), we have 𝐰^[i]=𝐱[i]𝐅[i](𝐱[𝒩i],𝐮[i])superscript^𝐰delimited-[]𝑖superscript𝐱delimited-[]𝑖superscript𝐅delimited-[]𝑖superscript𝐱delimited-[]subscript𝒩𝑖superscript𝐮delimited-[]𝑖\widehat{\mathbf{w}}^{[i]}=\mathbf{x}^{[i]}-\mathbf{F}^{[i]}(\mathbf{x}^{[% \mathcal{N}_{i}]},\mathbf{u}^{[i]})over^ start_ARG bold_w end_ARG start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT - bold_F start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT [ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_POSTSUPERSCRIPT , bold_u start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ). Hence, agent i𝑖iitalic_i only needs measurements of the neighboring states according to 𝒢𝒢\mathcal{G}caligraphic_G and local past inputs, thus complying with (34). ∎

The result of Proposition 1 can be extended to more complex cases. First, one can use local operators superscriptsubscript\mathbfcal{M}^{[i]}\in\mathcal{L}_{p}roman_ℳ start_POSTSUPERSCRIPT ∪ ⟩ ⊌ end_POSTSUPERSCRIPT ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT that, besides 𝐰^[i]superscript^𝐰delimited-[]𝑖\widehat{\mathbf{w}}^{[i]}over^ start_ARG bold_w end_ARG start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT, have access to disturbance reconstructions 𝐰^[j]superscript^𝐰delimited-[]𝑗\widehat{\mathbf{w}}^{[j]}over^ start_ARG bold_w end_ARG start_POSTSUPERSCRIPT [ italic_j ] end_POSTSUPERSCRIPT or control variables 𝐮[j]superscript𝐮delimited-[]𝑗\mathbf{u}^{[j]}bold_u start_POSTSUPERSCRIPT [ italic_j ] end_POSTSUPERSCRIPT computed at locations ji𝑗𝑖j\neq iitalic_j ≠ italic_i. While these architectures can be beneficial, e.g. for counteracting disturbances affecting other subsystems before they propagate to the subsystem i𝑖iitalic_i through coupling, they require additional communication channels {i,j}𝑖𝑗\{i,j\}{ italic_i , italic_j } if j𝒩i𝑗subscript𝒩𝑖j\not\in\mathcal{N}_{i}italic_j ∉ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Moreover, one has to use local operators superscript\mathbfcal{M}^{[i]}roman_ℳ start_POSTSUPERSCRIPT ∪ ⟩ ⊌ end_POSTSUPERSCRIPT guaranteeing that the whole operator \mathbfcal{M}roman_ℳ belongs to psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. To this purpose, in general, it is not enough that superscriptsubscript\mathbfcal{M}^{[i]}\in\mathcal{L}_{p}roman_ℳ start_POSTSUPERSCRIPT ∪ ⟩ ⊌ end_POSTSUPERSCRIPT ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT because the dependency on 𝐰^[j]superscript^𝐰delimited-[]𝑗\widehat{\mathbf{w}}^{[j]}over^ start_ARG bold_w end_ARG start_POSTSUPERSCRIPT [ italic_j ] end_POSTSUPERSCRIPT and 𝐮[j]superscript𝐮delimited-[]𝑗\mathbf{u}^{[j]}bold_u start_POSTSUPERSCRIPT [ italic_j ] end_POSTSUPERSCRIPT for ji𝑗𝑖j\neq iitalic_j ≠ italic_i can induce loop interconnections that can destabilize the closed-loop system. Classes of local operators superscript\mathbfcal{M}^{[i]}roman_ℳ start_POSTSUPERSCRIPT ∪ ⟩ ⊌ end_POSTSUPERSCRIPT yielding subscript\mathbfcal{M}\in\mathcal{L}_{p}roman_ℳ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT have been proposed in [43, 44] by using dissipativity theory.

Refer to caption
Figure 3: Example of networked dynamics (33) and decentralized IMC controller for agent i=1𝑖1i=1italic_i = 1.

V Learning to Boost Performance using Unconstrained Optimization

Leveraging the theoretical results of previous sections, we reformulate the performance-boosting problem in a form that facilitates optimizing by automatic differentiation and unconstrained gradient descent. This enables the use of highly flexible cost functions for complex nonlinear optimal control tasks. By design, the proposed approach guarantees closed-loop stability throughout the optimization process. We assess the effectiveness of the proposed methodology in achieving optimal performance through numerical experiments, in Section VI.

V-A IMC-based reformulation of performance boosting

The main value of Theorem 1 is that it enables reformulating Problem 1 as follows.

minsubscriptsubscript\displaystyle\min_{\mathbfcal{M}\in\mathcal{L}_{p}}roman_min start_POSTSUBSCRIPT roman_ℳ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT end_POSTSUBSCRIPT 𝔼wT:0[L(xT:0,uT:0)]subscript𝔼subscript𝑤:𝑇0delimited-[]𝐿subscript𝑥:𝑇0subscript𝑢:𝑇0\displaystyle\qquad\mathbb{E}_{w_{T:0}}\left[L(x_{T:0},u_{T:0})\right]blackboard_E start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_L ( italic_x start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT ) ] (35a)
s.t.\displaystyle\operatorname{s.t.}~{}~{}start_OPFUNCTION roman_s . roman_t . end_OPFUNCTION xt=ft(xt1:0,ut1:0)+wt,x0=w0,formulae-sequencesubscript𝑥𝑡subscript𝑓𝑡subscript𝑥:𝑡10subscript𝑢:𝑡10subscript𝑤𝑡subscript𝑥0subscript𝑤0\displaystyle x_{t}=f_{t}(x_{t-1:0},u_{t-1:0})+w_{t},\quad x_{0}=w_{0},italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT ) + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , (35b)
ut=t(wt:0),t=1,2,.formulae-sequencesubscript𝑢𝑡subscript𝑡subscript𝑤:𝑡0𝑡12\displaystyle u_{t}=\mathcal{M}_{t}({w}_{t:0})\,,\quad t=1,2,\ldots\,.italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = caligraphic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_t : 0 end_POSTSUBSCRIPT ) , italic_t = 1 , 2 , … . (35c)

Indeed, (6) corresponds to (35b)-(35c). If the exact dynamics ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in (35b) is not known, it must be simply replaced by the nominal model f^tsubscript^𝑓𝑡\widehat{f}_{t}over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

The reformulation (35) offers significant computational advantages as compared to Problem 1. In the classical linear quadratic case,666That is, when ftsubscript𝑓𝑡f_{t}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and \mathbfcal{M}roman_ℳ are linear and L𝐿Litalic_L is quadratic positive definite. (35) becomes strongly convex in \mathbfcal{M}roman_ℳ — enabling to use efficient convex optimization for finding a globally optimal solution [45, 40, 46, 47, 36]. In the general nonlinear case, searching over nonlinear operators subscript\mathbfcal{M}\in\mathcal{L}_{p}roman_ℳ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT remains significantly easier than tackling Problem 1 directly. Indeed, the set 𝒦𝒦\mathcal{K}caligraphic_K of controllers 𝐊()𝐊\mathbf{K}(\cdot)bold_K ( ⋅ ) complying with (5b) is, in general, difficult to parametrize. This is mainly because, given two stabilizing policies 𝐊1,𝐊2subscript𝐊1subscript𝐊2\mathbf{K}_{1},\mathbf{K}_{2}bold_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, their convex combinations 𝐊3=γ𝐊1+(1γ)𝐊2subscript𝐊3𝛾subscript𝐊11𝛾subscript𝐊2\mathbf{K}_{3}=\gamma\mathbf{K}_{1}+(1-\gamma)\mathbf{K}_{2}bold_K start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = italic_γ bold_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ( 1 - italic_γ ) bold_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with γ[0,1]𝛾01\gamma\in[0,1]italic_γ ∈ [ 0 , 1 ] and their cascaded composition 𝐊4=𝐊2(𝚽𝐱[𝑭,𝐊1]))\mathbf{K}_{4}=\mathbf{K}_{2}(\bm{\Phi}^{\mathbf{x}}[\bm{F},\mathbf{K}_{1}]))bold_K start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = bold_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_Φ start_POSTSUPERSCRIPT bold_x end_POSTSUPERSCRIPT [ bold_italic_F , bold_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ) ) do not result in stabilizing policies, in general; these issues are very well-known for the special case of linear systems [48, 45]. Hence, it is difficult to parameterize stabilizing policies, for instance, by composing or summing together base stabilizing operators. Instead, thanks to psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT being convex and closed under composition, there exist methods for parametrizing rich subsets of psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT through free parameters θd𝜃superscript𝑑\theta\in\mathbb{R}^{d}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, that is, to define operators θ𝜃\mathbfcal{M}(\theta)roman_ℳ ⇐ italic_θ ⇒ such that

θθformulae-sequence𝜃subscriptfor-all𝜃superscript\mathbfcal{M}(\theta)\in\mathcal{L}_{p},\quad\forall\theta\in\mathbb{R}^{d}\,.roman_ℳ ⇐ italic_θ ⇒ ∈ roman_ℒ start_POSTSUBSCRIPT √ end_POSTSUBSCRIPT ⇔ ∀ italic_θ ∈ roman_ℛ start_POSTSUPERSCRIPT ⌈ end_POSTSUPERSCRIPT ↙ (36)

This allows turning (35) into an unconstrained optimization problem over θd𝜃superscript𝑑\theta\in\mathbb{R}^{d}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

The last issue to be addressed is the computation of the average in (35a) that, as noticed before, is generally intractable. This is usually circumvented by approximating the exact average with its empirical counterpart obtained using a set of samples {wT:0s}s=1Ssuperscriptsubscriptsuperscriptsubscript𝑤:𝑇0𝑠𝑠1𝑆\{w_{T:0}^{s}\}_{s=1}^{S}{ italic_w start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT drawn from the distribution 𝒟𝒟\mathcal{D}caligraphic_D. One then obtains the finite-dimensional optimization problem:

minθdsubscript𝜃superscript𝑑\displaystyle\min_{\theta\in\mathbb{R}^{d}}roman_min start_POSTSUBSCRIPT italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT 1Ss=1SL(xT:0s,uT:0s)1𝑆superscriptsubscript𝑠1𝑆𝐿superscriptsubscript𝑥:𝑇0𝑠superscriptsubscript𝑢:𝑇0𝑠\displaystyle\frac{1}{S}\sum_{s=1}^{S}L(x_{T:0}^{s},u_{T:0}^{s})divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_L ( italic_x start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) (37a)
s.t.\displaystyle\operatorname{s.t.}~{}~{}start_OPFUNCTION roman_s . roman_t . end_OPFUNCTION xts=ft(xt1s,ut1s)+wts,w0s=x0s,formulae-sequencesuperscriptsubscript𝑥𝑡𝑠subscript𝑓𝑡superscriptsubscript𝑥𝑡1𝑠superscriptsubscript𝑢𝑡1𝑠superscriptsubscript𝑤𝑡𝑠superscriptsubscript𝑤0𝑠superscriptsubscript𝑥0𝑠\displaystyle x_{t}^{s}=f_{t}(x_{t-1}^{s},u_{t-1}^{s})+w_{t}^{s}\,,~{}~{}w_{0}% ^{s}=x_{0}^{s}\,,italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT = italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) + italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , (37b)
ut=t(θ)(wt:0s),t=0,1,2,,formulae-sequencesubscript𝑢𝑡subscript𝑡𝜃superscriptsubscript𝑤:𝑡0𝑠𝑡012\displaystyle u_{t}=\mathcal{M}_{t}(\theta)(w_{t:0}^{s})\,,\quad t=0,1,2,% \ldots\,,italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = caligraphic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) ( italic_w start_POSTSUBSCRIPT italic_t : 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) , italic_t = 0 , 1 , 2 , … , (37c)

where xT:0ssuperscriptsubscript𝑥:𝑇0𝑠x_{T:0}^{s}italic_x start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT and uT:0ssuperscriptsubscript𝑢:𝑇0𝑠u_{T:0}^{s}italic_u start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT are the inputs and states obtained when the disturbance wT:0ssuperscriptsubscript𝑤:𝑇0𝑠w_{T:0}^{s}italic_w start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT is applied. While in this work we only consider the empirical cost in the optimization problem (37a), the closed-loop performance when faced with out-of-sample noise sequences is further investigated in [49].

Finally, we highlight that (37b) and (37c) can be seen as the equations of the layer t𝑡titalic_t of a neural network with depth T𝑇Titalic_T and parametrized by θ𝜃\thetaitalic_θ. When tsubscript𝑡\mathcal{M}_{t}caligraphic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, for t=0,1,𝑡01t=0,1,\ldotsitalic_t = 0 , 1 , … is sufficiently smooth, the absence of constraints on θ𝜃\thetaitalic_θ enables the use of powerful packages, such as TensorFlow [50] and PyTorch [51], leveraging automatic differentiation and backpropagation for optimizing the controller through gradient descent.

V-B Free parameterizations of 2subscript2\mathcal{L}_{2}caligraphic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT subsets

As highlighted in Section V.V-A, the possibility of obtaining effective controllers by solving (37) critically depends on our ability to parametrize psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT operators. The main obstacle is that the space psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is infinite-dimensional. Hence, for implementation, one usually restrict the search in subsets of psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT described by finitely many parameters. When linear systems are considered, one can search over Finite Impulse Response (FIR) transfer matrices 𝐌=i=0NM[i]zi𝒯s𝐌superscriptsubscript𝑖0𝑁𝑀delimited-[]𝑖superscript𝑧𝑖𝒯subscript𝑠\mathbf{M}=\sum_{i=0}^{N}M[i]z^{-i}\in\mathcal{TF}_{s}bold_M = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_M [ italic_i ] italic_z start_POSTSUPERSCRIPT - italic_i end_POSTSUPERSCRIPT ∈ caligraphic_T caligraphic_F start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. and then optimize over the finitely many real matrices M[i]𝑀delimited-[]𝑖M[i]italic_M [ italic_i ]. Less and less conservative solutions can be obtained by increasing the FIR order N𝑁Nitalic_N. However, the FIR approach limits the search to linear control policies.

Recently, [30, 31, 52] have proposed finite-dimensional DNN approximations of nonlinear 2subscript2\mathcal{L}_{2}caligraphic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT operators. In the sequel we briefly review the Recurrent Equilibrium Network (REN) models proposed in [31]. An operator ¬\¬superscript\superscript\mathbfcal{M}:\ell^{n}\rightarrow\ell^{m}roman_ℳ ¬ roman_ℓ start_POSTSUPERSCRIPT \ end_POSTSUPERSCRIPT → roman_ℓ start_POSTSUPERSCRIPT ⇕ end_POSTSUPERSCRIPT is a REN if the relationship 𝐮=^𝐮^\mathbf{u}=\mathbfcal{M}(\widehat{\mathbf{\mathbf{w}}})bold_u = roman_ℳ ⇐ over^ start_ARG ⊒ end_ARG ⇒ is recursively generated by the following dynamical system:

[ξtztut]=[A1B1B2C1D11D12C2D21D22]W[ξt1σ(zt)wt]+[bx,tbz,tbw,t]bt,ξ1=0,formulae-sequencematrixsubscript𝜉𝑡subscript𝑧𝑡subscript𝑢𝑡superscriptmatrixsubscript𝐴1subscript𝐵1subscript𝐵2subscript𝐶1subscript𝐷11subscript𝐷12subscript𝐶2subscript𝐷21subscript𝐷22𝑊matrixsubscript𝜉𝑡1𝜎subscript𝑧𝑡subscript𝑤𝑡superscriptmatrixsubscript𝑏𝑥𝑡subscript𝑏𝑧𝑡subscript𝑏𝑤𝑡subscript𝑏𝑡subscript𝜉10\begin{bmatrix}\xi_{t}\\ z_{t}\\ u_{t}\end{bmatrix}=\overbrace{\begin{bmatrix}A_{1}&B_{1}&B_{2}\\ C_{1}&D_{11}&D_{12}\\ C_{2}&D_{21}&D_{22}\end{bmatrix}}^{W}\begin{bmatrix}\xi_{t-1}\\ \sigma(z_{t})\\ w_{t}\end{bmatrix}+\overbrace{\begin{bmatrix}b_{x,t}\\ b_{z,t}\\ b_{w,t}\end{bmatrix}}^{b_{t}}\,,\quad\xi_{-1}=0\,,[ start_ARG start_ROW start_CELL italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = over⏞ start_ARG [ start_ARG start_ROW start_CELL italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL italic_D start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL italic_D start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL italic_D start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_CELL start_CELL italic_D start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] end_ARG start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL italic_ξ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_σ ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + over⏞ start_ARG [ start_ARG start_ROW start_CELL italic_b start_POSTSUBSCRIPT italic_x , italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_b start_POSTSUBSCRIPT italic_z , italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_b start_POSTSUBSCRIPT italic_w , italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] end_ARG start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_ξ start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT = 0 , (38)

where ξtqsubscript𝜉𝑡superscript𝑞\xi_{t}\in\mathbb{R}^{q}italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT, vtrsubscript𝑣𝑡superscript𝑟v_{t}\in\mathbb{R}^{r}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT, bx,t,bz,t,bw,tsubscript𝑏𝑥𝑡subscript𝑏𝑧𝑡subscript𝑏𝑤𝑡subscriptb_{x,t},b_{z,t},b_{w,t}\in\ell_{\infty}italic_b start_POSTSUBSCRIPT italic_x , italic_t end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_z , italic_t end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_w , italic_t end_POSTSUBSCRIPT ∈ roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT777This is slightly different from the the original REN model, where these signals [31] are assumed to be constant. and σ::𝜎\sigma:\mathbb{R}\rightarrow\mathbb{R}italic_σ : blackboard_R → blackboard_R — the activation function — is applied element-wise. Further, σ()𝜎\sigma(\cdot)italic_σ ( ⋅ ) must be piecewise differentiable and with first derivatives restricted to the interval [0,1]01[0,1][ 0 , 1 ]. As noted in [31], RENs subsume many existing DNN architectures. In general, RENs define deep equilibrium network models [53] due to the implicit relationships defining ztsubscript𝑧𝑡z_{t}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in the second block row of (38). By restricting D11subscript𝐷11D_{11}italic_D start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT to be strictly lower-triangular, the value of ztsubscript𝑧𝑡z_{t}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be computed explicitly, thus significantly speeding-up computations [31]. To give an example of the expressivity of (38), by suitably choosing the size and zero pattern of matrices in (38), RENs can provide nonlinear systems in the form

ξt=A^ξt1+B^NNξ(ξt1,w^t)subscript𝜉𝑡^𝐴subscript𝜉𝑡1^𝐵superscriptNN𝜉subscript𝜉𝑡1subscript^𝑤𝑡\displaystyle\xi_{t}=\hat{A}\xi_{t-1}+\hat{B}\,\text{NN}^{\xi}(\xi_{t-1},% \widehat{w}_{t})italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over^ start_ARG italic_A end_ARG italic_ξ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + over^ start_ARG italic_B end_ARG NN start_POSTSUPERSCRIPT italic_ξ end_POSTSUPERSCRIPT ( italic_ξ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
ut=C^ξt+D^NNu(ξt1,w^t)subscript𝑢𝑡^𝐶subscript𝜉𝑡^𝐷superscriptNN𝑢subscript𝜉𝑡1subscript^𝑤𝑡\displaystyle u_{t}=\hat{C}\xi_{t}+\hat{D}\,\text{NN}^{u}(\xi_{t-1},\widehat{w% }_{t})italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over^ start_ARG italic_C end_ARG italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + over^ start_ARG italic_D end_ARG NN start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ( italic_ξ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )

where A^^𝐴\hat{A}over^ start_ARG italic_A end_ARG, B^^𝐵\hat{B}over^ start_ARG italic_B end_ARG, C^^𝐶\hat{C}over^ start_ARG italic_C end_ARG, D^^𝐷\hat{D}over^ start_ARG italic_D end_ARG are arbitrary matrices of suitable dimensions and NN𝑁superscript𝑁NN^{\star}italic_N italic_N start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT, {ξ,u}\star\in\{\xi,u\}⋆ ∈ { italic_ξ , italic_u }, are neural networks of depth L𝐿Litalic_L given by the relations

z~0,t=[ξt1,w^t],superscriptsubscript~𝑧0𝑡superscriptsuperscriptsubscript𝜉𝑡1topsuperscriptsubscript^𝑤𝑡toptop\displaystyle\tilde{z}_{0,t}^{\star}=[\xi_{t-1}^{\top},\hat{w}_{t}^{\top}]^{% \top},over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT 0 , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = [ italic_ξ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ,
z~k+1,t=σ(Wkz~k,t+bk),k=0,L1formulae-sequencesuperscriptsubscript~𝑧𝑘1𝑡𝜎superscriptsubscript𝑊𝑘superscriptsubscript~𝑧𝑘𝑡superscriptsubscript𝑏𝑘𝑘0𝐿1\displaystyle\tilde{z}_{k+1,t}^{\star}=\sigma(W_{k}^{\star}\tilde{z}_{k,t}^{% \star}+b_{k}^{\star}),\quad k=0,\ldots L-1over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_k + 1 , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = italic_σ ( italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_k , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT + italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) , italic_k = 0 , … italic_L - 1

where Wksuperscriptsubscript𝑊𝑘W_{k}^{\star}italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT and bksuperscriptsubscript𝑏𝑘b_{k}^{\star}italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT are the layer weights and biases, respectively, and z~L,tsuperscriptsubscript~𝑧𝐿𝑡\tilde{z}_{L,t}^{\star}over~ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_L , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT is the NN output.

For an arbitrary choice of W𝑊Witalic_W and btsubscript𝑏𝑡b_{t}italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the map \mathbfcal{M}roman_ℳ induced by (38) may not lie in 2subscript2\mathcal{L}_{2}caligraphic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. The work [31] provides an explicit smooth map** Θ:d(q+r+m)×(q+r+n):Θsuperscript𝑑superscript𝑞𝑟𝑚𝑞𝑟𝑛\Theta:\mathbb{R}^{d}\rightarrow\mathbb{R}^{(q+r+m)\times(q+r+n)}roman_Θ : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT ( italic_q + italic_r + italic_m ) × ( italic_q + italic_r + italic_n ) end_POSTSUPERSCRIPT from unconstrained training parameters θd𝜃superscript𝑑\theta\in\mathbb{R}^{d}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT to a matrix W=Θ(θ)(q+r+m)×(q+r+n)𝑊Θ𝜃superscript𝑞𝑟𝑚𝑞𝑟𝑛W=\Theta(\theta)\in\mathbb{R}^{(q+r+m)\times(q+r+n)}italic_W = roman_Θ ( italic_θ ) ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_q + italic_r + italic_m ) × ( italic_q + italic_r + italic_n ) end_POSTSUPERSCRIPT defining (38), with the property that the corresponding operator θ𝜃\mathbfcal{M}(\theta)roman_ℳ ⇐ italic_θ ⇒ lies in 2subscript2\mathcal{L}_{2}caligraphic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT by design when bt=0subscript𝑏𝑡0b_{t}=0italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0.888Furthermore, RENs enjoy contractivity — although the theoretical results of this paper do not rely on this property. This approach can be easily generalized by including vectors btsubscript𝑏𝑡b_{t}italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, t=1,,T𝑡1𝑇t=1,\ldots,Titalic_t = 1 , … , italic_T in the set of trainable parameters and assuming bt=0subscript𝑏𝑡0b_{t}=0italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 for t>T𝑡𝑇t>Titalic_t > italic_T. Recently, free parameterizations of continuous-time 2subscript2\mathcal{L}_{2}caligraphic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT operators through RENs and port-Hamiltonian systems have been also proposed in [52] and [54], respectively.

VI Numerical Experiments: the Magic of the Cost

In this section, we test the flexibility of performance boosting by considering cooperative robotics problems. Firstly, we validate the fail-safe feature of the design approach by showing that closed-loop stability is preserved during and after training — both when the system model is known and when it is uncertain. Secondly, we exploit the freedom in selecting the cost L(xT:0,uT:0)𝐿subscript𝑥:𝑇0subscript𝑢:𝑇0L(x_{T:0},u_{T:0})italic_L ( italic_x start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT ) to include appropriate terms aimed at promoting complex closed-loop behaviors.

In all the examples, we consider two point-mass vehicles, each with position pt[i]2superscriptsubscript𝑝𝑡delimited-[]𝑖superscript2p_{t}^{[i]}\in\mathbb{R}^{2}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and velocity qt[i]2superscriptsubscript𝑞𝑡delimited-[]𝑖superscript2q_{t}^{[i]}\in\mathbb{R}^{2}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, for i=1,2𝑖12i=1,2italic_i = 1 , 2, subject to nonlinear drag forces (e.g., air or water resistance). The discrete-time model for vehicle i𝑖iitalic_i is

[pt[i]qt[i]]=[pt1[i]qt1[i]]+Ts[qt1[i](m[i])1(C(qt1[i])+Ft1[i])],matrixsuperscriptsubscript𝑝𝑡delimited-[]𝑖superscriptsubscript𝑞𝑡delimited-[]𝑖matrixsuperscriptsubscript𝑝𝑡1delimited-[]𝑖superscriptsubscript𝑞𝑡1delimited-[]𝑖subscript𝑇𝑠matrixsuperscriptsubscript𝑞𝑡1delimited-[]𝑖superscriptsuperscript𝑚delimited-[]𝑖1𝐶superscriptsubscript𝑞𝑡1delimited-[]𝑖superscriptsubscript𝐹𝑡1delimited-[]𝑖\begin{bmatrix}p_{t}^{[i]}\\ q_{t}^{[i]}\end{bmatrix}=\begin{bmatrix}p_{t-1}^{[i]}\\ q_{t-1}^{[i]}\end{bmatrix}+T_{s}\begin{bmatrix}q_{t-1}^{[i]}\\ (m^{[i]})^{-1}\left(-C(q_{t-1}^{[i]})+F_{t-1}^{[i]}\right)\end{bmatrix}\,,[ start_ARG start_ROW start_CELL italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_p start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] + italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT [ start_ARG start_ROW start_CELL italic_q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ( italic_m start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( - italic_C ( italic_q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) + italic_F start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARG ] , (39)

where m[i]>0superscript𝑚delimited-[]𝑖0m^{[i]}>0italic_m start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT > 0 is the mass, Ft[i]2superscriptsubscript𝐹𝑡delimited-[]𝑖superscript2F_{t}^{[i]}\in\mathbb{R}^{2}italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT denotes the force control input, Ts>0subscript𝑇𝑠0T_{s}>0italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT > 0 is the sampling time and C[i]:22:superscript𝐶delimited-[]𝑖superscript2superscript2C^{[i]}:\mathbb{R}^{2}\rightarrow\mathbb{R}^{2}italic_C start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT : blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is a drag function given by C[i](s)=b1[i]sb2[i]tanh(s)superscript𝐶delimited-[]𝑖𝑠superscriptsubscript𝑏1delimited-[]𝑖𝑠superscriptsubscript𝑏2delimited-[]𝑖𝑠C^{[i]}(s)=b_{1}^{[i]}s-b_{2}^{[i]}\tanh(s)italic_C start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ( italic_s ) = italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT italic_s - italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT roman_tanh ( italic_s ), for some 0<b2[i]<b1[i]0superscriptsubscript𝑏2delimited-[]𝑖superscriptsubscript𝑏1delimited-[]𝑖0<b_{2}^{[i]}<b_{1}^{[i]}0 < italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT < italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT. Each vehicle must reach a target position p¯[i]2superscript¯𝑝delimited-[]𝑖superscript2\overline{p}^{[i]}\in\mathbb{R}^{2}over¯ start_ARG italic_p end_ARG start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with zero velocity in a stable way. This elementary goal can be achieved by using a base proportional controller

Ft[i]=K[i](p¯[i]pt[i]),superscriptsubscriptsuperscript𝐹𝑡delimited-[]𝑖superscriptsuperscript𝐾delimited-[]𝑖superscript¯𝑝delimited-[]𝑖superscriptsubscript𝑝𝑡delimited-[]𝑖{F^{\prime}}_{t}^{[i]}={K^{\prime}}^{[i]}(\bar{p}^{[i]}-p_{t}^{[i]})\,,italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT = italic_K start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ( over¯ start_ARG italic_p end_ARG start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT - italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) , (40)

with K[i]=diag(k1[i],k2[i])superscript𝐾delimited-[]𝑖diagsuperscriptsubscript𝑘1delimited-[]𝑖superscriptsubscript𝑘2delimited-[]𝑖K^{\prime[i]}=\operatorname{diag}(k_{1}^{[i]},k_{2}^{[i]})italic_K start_POSTSUPERSCRIPT ′ [ italic_i ] end_POSTSUPERSCRIPT = roman_diag ( italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT , italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) and k1[i],k2[i]>0superscriptsubscript𝑘1delimited-[]𝑖superscriptsubscript𝑘2delimited-[]𝑖0k_{1}^{[i]},k_{2}^{[i]}>0italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT , italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT > 0. The overall dynamics ft(xt1:0,ut1:0)subscript𝑓𝑡subscript𝑥:𝑡10subscript𝑢:𝑡10f_{t}(x_{t-1:0},u_{t-1:0})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t - 1 : 0 end_POSTSUBSCRIPT ) in (1) is given by (39)-(40) with Ft[i]=Ft[i]+ut[i]subscriptsuperscript𝐹delimited-[]𝑖𝑡subscriptsuperscript𝐹delimited-[]𝑖𝑡superscriptsubscript𝑢𝑡delimited-[]𝑖F^{[i]}_{t}=F^{\prime[i]}_{t}+u_{t}^{[i]}italic_F start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_F start_POSTSUPERSCRIPT ′ [ italic_i ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT, where xt=(pt[1],qt[1],pt[2],qt[2])subscript𝑥𝑡superscriptsubscript𝑝𝑡delimited-[]1superscriptsubscript𝑞𝑡delimited-[]1superscriptsubscript𝑝𝑡delimited-[]2superscriptsubscript𝑞𝑡delimited-[]2x_{t}=(p_{t}^{[1]},q_{t}^{[1]},p_{t}^{[2]},q_{t}^{[2]})italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 1 ] end_POSTSUPERSCRIPT , italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 1 ] end_POSTSUPERSCRIPT , italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 2 ] end_POSTSUPERSCRIPT , italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 2 ] end_POSTSUPERSCRIPT ) and ut=(ut[1],ut[2])subscript𝑢𝑡superscriptsubscript𝑢𝑡delimited-[]1superscriptsubscript𝑢𝑡delimited-[]2u_{t}=(u_{t}^{[1]},u_{t}^{[2]})italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 1 ] end_POSTSUPERSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ 2 ] end_POSTSUPERSCRIPT ) is a performance-boosting control input to be designed. As per (1), we consider additive disturbances affecting the system dynamics. Thanks to the use of the prestabilizing controller (40), one can show that subscript\mathbfcal{F}(\mathbb{u},\mathbb{w})\in\mathcal{L}_{2}roman_ℱ ⇐ ⊓ ⇔ ⊒ ⇒ ∈ roman_ℒ start_POSTSUBSCRIPT ∈ end_POSTSUBSCRIPT.

The goal of the performance-boosting policy is to enforce additional desired behaviors, on top of stability, which are specified in each of the following subsections. In all cases, we parametrize the operator θ𝜃subscript\mathbfcal{M}(\theta)\in\mathcal{L}_{2}roman_ℳ ⇐ italic_θ ⇒ ∈ roman_ℒ start_POSTSUBSCRIPT ∈ end_POSTSUBSCRIPT as a REN, see (38). Appendix -A presents all the implementation details, such as parameter values and exact definitions of the cost functions. The code to reproduce our examples as well as various movies are available in our Github repository.999https://github.com/DecodEPFL/performance-boosting_controllers.git

VI-A Robust stability preservation during optimization

We consider the scenario mountains in Figure 4 where each vehicle must reach the target position in a stable way while avoiding collisions between themselves and with two grey obstacles. Each agent is represented with a circle that indicates its radius for the collision avoidance specifications. When using the base controller (40), the vehicles successfully achieve the target, however, they do so with poor performance since collisions are not avoided, as shown in Figure 4(a).

We select a loss L(xT:0,uT:0)𝐿subscript𝑥:𝑇0subscript𝑢:𝑇0L(x_{T:0},u_{T:0})italic_L ( italic_x start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT ) as the sum of stage costs l(xt,ut)𝑙subscript𝑥𝑡subscript𝑢𝑡l(x_{t},u_{t})italic_l ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), that is, L(xT:0,uT:0)=t=0Tl(xt,ut)𝐿subscript𝑥:𝑇0subscript𝑢:𝑇0superscriptsubscript𝑡0𝑇𝑙subscript𝑥𝑡subscript𝑢𝑡L(x_{T:0},u_{T:0})=\sum_{t=0}^{T}l(x_{t},u_{t})italic_L ( italic_x start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_l ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) with

l(xt,ut)=ltraj(xt,ut)+lca(xt)+lobs(xt),𝑙subscript𝑥𝑡subscript𝑢𝑡subscript𝑙𝑡𝑟𝑎𝑗subscript𝑥𝑡subscript𝑢𝑡subscript𝑙𝑐𝑎subscript𝑥𝑡subscript𝑙𝑜𝑏𝑠subscript𝑥𝑡l(x_{t},u_{t})=l_{traj}(x_{t},u_{t})+l_{ca}(x_{t})+l_{obs}(x_{t})\,,italic_l ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_l start_POSTSUBSCRIPT italic_t italic_r italic_a italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_l start_POSTSUBSCRIPT italic_c italic_a end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_l start_POSTSUBSCRIPT italic_o italic_b italic_s end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , (41)

where ltraj(xt,ut)=[xt𝖳ut𝖳]Q[xt𝖳ut𝖳]𝖳subscript𝑙𝑡𝑟𝑎𝑗subscript𝑥𝑡subscript𝑢𝑡matrixsuperscriptsubscript𝑥𝑡𝖳superscriptsubscript𝑢𝑡𝖳𝑄superscriptmatrixsuperscriptsubscript𝑥𝑡𝖳superscriptsubscript𝑢𝑡𝖳𝖳l_{traj}(x_{t},u_{t})=\begin{bmatrix}x_{t}^{\mathsf{T}}&u_{t}^{\mathsf{T}}\end% {bmatrix}Q\begin{bmatrix}x_{t}^{\mathsf{T}}&u_{t}^{\mathsf{T}}\end{bmatrix}^{% \mathsf{T}}italic_l start_POSTSUBSCRIPT italic_t italic_r italic_a italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = [ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT end_CELL start_CELL italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] italic_Q [ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT end_CELL start_CELL italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT with Q0succeeds-or-equals𝑄0Q\succeq 0italic_Q ⪰ 0 penalizes the distance of agents from their targets and the control energy, lca(xt)subscript𝑙𝑐𝑎subscript𝑥𝑡l_{ca}(x_{t})italic_l start_POSTSUBSCRIPT italic_c italic_a end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) and lobs(xt)subscript𝑙𝑜𝑏𝑠subscript𝑥𝑡l_{obs}(x_{t})italic_l start_POSTSUBSCRIPT italic_o italic_b italic_s end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) penalize collisions between agents and with obstacles, respectively.

In order to train the performance-boosting controller, we solve (37), using a REN (38) of dimension q=r=8𝑞𝑟8q=r=8italic_q = italic_r = 8. The training data consists of a set of 100 initial positions, i.e., we set w0=((p0x)[1],(p0y)[1],0,0,(p0x)[2],(p0y)[2],0,0)subscript𝑤0superscriptsubscriptsuperscript𝑝𝑥0delimited-[]1superscriptsubscriptsuperscript𝑝𝑦0delimited-[]100superscriptsubscriptsuperscript𝑝𝑥0delimited-[]2superscriptsubscriptsuperscript𝑝𝑦0delimited-[]200w_{0}=((p^{x}_{0})^{[1]},(p^{y}_{0})^{[1]},0,0,(p^{x}_{0})^{[2]},(p^{y}_{0})^{% [2]},0,0)italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( ( italic_p start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT [ 1 ] end_POSTSUPERSCRIPT , ( italic_p start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT [ 1 ] end_POSTSUPERSCRIPT , 0 , 0 , ( italic_p start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT [ 2 ] end_POSTSUPERSCRIPT , ( italic_p start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT [ 2 ] end_POSTSUPERSCRIPT , 0 , 0 ) and wt=0subscript𝑤𝑡0w_{t}=0italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0, for t>0𝑡0t>0italic_t > 0, where pxsuperscript𝑝𝑥p^{x}italic_p start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT and pysuperscript𝑝𝑦p^{y}italic_p start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT denote the x𝑥xitalic_x and y𝑦yitalic_y coordinates of the vehicles in the Cartesian plane, respectively. Initial positions are sampled from a Gaussian distribution around the nominal initial condition. Figure 4(b-c) shows the nominal and training initial conditions marked with ‘×\times×’ and ‘\circ’, respectively, and three test trajectories after the training of the IMC controller. The trained control policies avoid collisions and achieve optimized trajectories thanks to minimizing (41).

Refer to caption
Refer to caption
Refer to caption
Figure 4: Mountains — Closed-loop trajectories before training (left) and after training (middle and right) over 100 randomly sampled initial conditions marked with \circ. Snapshots taken at time-instants τ𝜏\tauitalic_τ. Colored (gray) lines show the trajectories in [0,τi]0subscript𝜏𝑖[0,\tau_{i}][ 0 , italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ([τi,)subscript𝜏𝑖[\tau_{i},\infty)[ italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ∞ )). Colored balls (and their radius) represent the agents (and their size for collision avoidance).

VI-A1 Early stop** of the training

We validate the fail-safe property of our IMC control policies. We consider the scenario mountains as above but where the training process is interrupted before achieving a local minimum, as per the one in Figure 4. In particular, we stop the optimization algorithm after 25%, 50%, and 75% of the total number of epochs. The obtained trajectories are shown in Figure 5. We observe that even if the performance is not optimized, closed-loop stability is always guaranteed.

Refer to caption
Refer to caption
Refer to caption
Figure 5: Mountains — Closed-loop trajectories after 25%, 50% and 75% of the total training whose closed-loop trajectory is shown in Figure 4. Even if the performance can be further optimized, stability is always guaranteed.

VI-A2 Model mismatch

We test our trained IMC controller when considering model mismatch on the system. In particular, we assume that the true vehicles have an incertitude over the mass of ±10%plus-or-minuspercent10\pm 10\%± 10 %, and we apply IMC control policies embedding the nominal system with the nominal mass value. Figures 6 (a-b) validate the robust 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-stability of the closed-loop trajectories when the vehicles are lighter and heavier, respectively. Theorem 3 suggests that, in this case, the gain of \mathbfcal{M}roman_ℳ may be sufficiently low to counteract the effect of model uncertainty. Note, however, that checking the sufficient condition (22) requires computing an upper bound on γ(𝚫)𝛾𝚫\gamma(\bm{\Delta})italic_γ ( bold_Δ ) — a cumbersome task for general nonlinear systems. Nonetheless, Theorem 3 ensures that, in practical implementation, we can always reduce γ(\gamma(\mathbfcal{M})italic_γ ( roman_ℳ ⇒ enough to eventually meet (22).

VI-B Boosting for safety and invariance certificates

A challenging task in many control applications is to deal with stringent safety constraints on the state variables. Ideally, one would directly add the constraint that

xt𝒞,t=0,1,,formulae-sequencesubscript𝑥𝑡𝒞for-all𝑡01x_{t}\in\mathcal{C}\,,\forall t=0,1,\ldots\,,italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_C , ∀ italic_t = 0 , 1 , … , (42)

in the IMC-based performance-boosting problem (35), where 𝒞n𝒞superscript𝑛\mathcal{C}\subseteq\mathbb{R}^{n}caligraphic_C ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT defines a safety region. Unfortunately, (42) generally results in intractable constraints over \mathbfcal{M}roman_ℳ. Indeed, it may be challenging to even verify that (42) holds for a certain \mathbfcal{M}roman_ℳ due to the infinite-horizon requirement and the involved nonlinearities. Many state-of-the-art approaches for guaranteeing safety hinge on either predictive safety filters [55, 56] or Control Barrier Functions (CBFs) [57, 58]. Safety filters are used during deployment: they override the control input 𝐮=^𝐮^\mathbf{u}=\mathbfcal{M}(\widehat{\mathbf{w}})bold_u = roman_ℳ ⇐ over^ start_ARG ⊒ end_ARG ⇒ with a different (suboptimal) control variable when deemed necessary for guaranteeing safety. Instead, CBFs can be used for safety verification of a given policy, as they allow characterizing 𝒞𝒞\mathcal{C}caligraphic_C as a forward invariant set based on a safety-set-defining function h(x):𝒳:𝑥𝒳h(x):\mathcal{X}\rightarrow\mathbb{R}italic_h ( italic_x ) : caligraphic_X → blackboard_R satisfying h(x)0𝑥0h(x)\geq 0italic_h ( italic_x ) ≥ 0 for all x𝒞𝑥𝒞x\in\mathcal{C}italic_x ∈ caligraphic_C. Certifying the forward invariance of 𝒞𝒞\mathcal{C}caligraphic_C translates into determining if h(x)𝑥h(x)italic_h ( italic_x ) is a CBF through verification of some safety conditions.101010An exact definition of CBFs for the discrete-time can be found in [58]; for a more general discussion on CBFs we refer the reader to [57]. In particular, one can verify that, for any xt𝒞subscript𝑥𝑡𝒞x_{t}\in\mathcal{C}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_C, if there exists an input utsubscript𝑢𝑡u_{t}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT giving xt+1subscript𝑥𝑡1x_{t+1}italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT such that it holds

h(xt+1)h(xt)+γh(xt)0,subscript𝑥𝑡1subscript𝑥𝑡𝛾subscript𝑥𝑡0h(x_{t+1})-h(x_{t})+\gamma h(x_{t})\geq 0\,,italic_h ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) - italic_h ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_γ italic_h ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ 0 , (43)

where 0<γ10𝛾10<\gamma\leq 10 < italic_γ ≤ 1, then h(x)𝑥h(x)italic_h ( italic_x ) is a CBF.

While optimizing over \mathbfcal{M}roman_ℳ such that (42) holds by design remains an open challenge, we aim to promote forward invariant sets by sha** the cost to include soft safety specifications over a horizon of length T𝑇Titalic_T. In particular, the new cost term penalizes violations of (43) as per

inv=t=0T1ReLU(h(xt)h(xt+1)γh(xt)).subscriptinvsuperscriptsubscript𝑡0𝑇1ReLUsubscript𝑥𝑡subscript𝑥𝑡1𝛾subscript𝑥𝑡\mathcal{L}_{\operatorname{inv}}=\sum_{t=0}^{T-1}\operatorname{ReLU}\left(h(x_% {t})-h(x_{t+1})-\gamma h(x_{t})\right)\,.caligraphic_L start_POSTSUBSCRIPT roman_inv end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT roman_ReLU ( italic_h ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_h ( italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) - italic_γ italic_h ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) . (44)

We consider the mountains scenario again and add the requirement that (pty)[i]<(p¯y)[i]+0.1superscriptsuperscriptsubscript𝑝𝑡𝑦delimited-[]𝑖superscriptsuperscript¯𝑝𝑦delimited-[]𝑖0.1(p_{t}^{y})^{[i]}<(\bar{p}^{y})^{[i]}+0.1( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT < ( over¯ start_ARG italic_p end_ARG start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT + 0.1 for each vehicle i=1,2𝑖12i=1,2italic_i = 1 , 2 and every t=0,1,𝑡01t=0,1,\ldotsitalic_t = 0 , 1 , …, where ptysuperscriptsubscript𝑝𝑡𝑦p_{t}^{y}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT denotes the y𝑦yitalic_y-coordinate of each center-of-mass position on the Cartesian plane. In other words, we only allow an overshoot of 0.10.10.10.1 in the vertical direction with respect to the target position for each vehicle. By defining h(xt)=i=12((p¯y)[i]+0.1(pty)[i])subscript𝑥𝑡superscriptsubscript𝑖12superscriptsuperscript¯𝑝𝑦delimited-[]𝑖0.1superscriptsuperscriptsubscript𝑝𝑡𝑦delimited-[]𝑖h(x_{t})=\sum_{i=1}^{2}((\bar{p}^{y})^{[i]}+0.1-(p_{t}^{y})^{[i]})italic_h ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( ( over¯ start_ARG italic_p end_ARG start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT + 0.1 - ( italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT ) we add the term (44) to the loss function (37a). Upon training without including invsubscriptinv\mathcal{L}_{\operatorname{inv}}caligraphic_L start_POSTSUBSCRIPT roman_inv end_POSTSUBSCRIPT in the cost, the masses violate the constraints, on average, on 67.49%percent67.4967.49\%67.49 % of the time over 100 runs — typical trajectories are shown in Figure 4. The violation ratio is decreased to 5.43%percent5.435.43\%5.43 % when invsubscriptinv\mathcal{L}_{\operatorname{inv}}caligraphic_L start_POSTSUBSCRIPT roman_inv end_POSTSUBSCRIPT is included, as shown in Figure 6(c), where the gray area indicates the unsafe region to be avoided by the vehicles. Note that sha** the cost through invsubscriptinv\mathcal{L}_{\operatorname{inv}}caligraphic_L start_POSTSUBSCRIPT roman_inv end_POSTSUBSCRIPT is also beneficial if one implements an online safety filter such as [55, 56] during deployment. This is because penalizing invsubscriptinv\mathcal{L}_{\operatorname{inv}}caligraphic_L start_POSTSUBSCRIPT roman_inv end_POSTSUBSCRIPT drastically decreases constraint violations of the closed-loop system, and hence, the suboptimal online intervention of the safety filter would be much less frequent.

Refer to caption
Refer to caption
Refer to caption
Figure 6: Mountains — Closed-loop trajectories after training. (Left and middle) Controller tested over a system with mass uncertainty (-10% and +10%, respectively). (Right) Trained controller with safety promotion through (44). Training initial conditions marked with \circ. Snapshots taken at time-instants τ𝜏\tauitalic_τ. Colored (gray) lines show the trajectories in [0,τi]0subscript𝜏𝑖[0,\tau_{i}][ 0 , italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ([τi,)subscript𝜏𝑖[\tau_{i},\infty)[ italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ∞ )). Colored balls (and their radius) represent the agents (and their size for collision avoidance).

VI-C Boosting for temporal logic specifications

The success of many policy learning algorithms, e.g., in RL, is highly dependent on the choice of the reward functions for capturing the desired behavior and constraints of an agent. When tasks become complex, specifying loss functions that are the sum over time of stage costs can be restrictive. For instance, consider the case of an agent that must optimally visit a set of locations. A loss function composed of a stage-cost summed over time — that is, the one considered in dynamic programming and classical optimal control [59, 3] — cannot easily capture this task, as it would need a-priori information about the optimal timings to visit each location. To overcome this problem, one could use more complex loss functions, as per those derived from temporal logic formulations. In particular, truncated linear temporal logic (TLTL) is a specification language leveraging a set of operators defined over finite-time trajectories [60, 61]. It allows incorporating domain knowledge, and constraints (in a soft fashion) into the learning process, such as “always avoid obstacles”, “eventually visit location a𝑎aitalic_a”, or “do not visit location b𝑏bitalic_b until visiting location a𝑎aitalic_a”. Then, using quantitative semantics one can automatically transform TLTL formulae into real-valued loss functions that are compositions of min\minroman_min and max\maxroman_max functions over a finite period of time [60, 61].

To test the efficacy of TLTL specifications for sha** complex stable closed-loop behavior, we consider the scenario waypoint-tracking, shown in Figure 7, where the two vehicles have to visit a sequence of waypoints while avoiding collisions between them and the gray obstacles. The blue vehicle’s goal is to visit gbsubscript𝑔𝑏g_{b}italic_g start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT, then gasubscript𝑔𝑎g_{a}italic_g start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and then gcsubscript𝑔𝑐g_{c}italic_g start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, while the goal for the orange vehicle is to visit the waypoints in the following order: gcsubscript𝑔𝑐g_{c}italic_g start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, gbsubscript𝑔𝑏g_{b}italic_g start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT and gasubscript𝑔𝑎g_{a}italic_g start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT. Following [60], the loss formulation for the orange agent is translated into plain English as “Visit gcsubscript𝑔𝑐g_{c}italic_g start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT then gbsubscript𝑔𝑏g_{b}italic_g start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT then gasubscript𝑔𝑎g_{a}italic_g start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT; and don’t visit gbsubscript𝑔𝑏g_{b}italic_g start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT or gasubscript𝑔𝑎g_{a}italic_g start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT until visiting gcsubscript𝑔𝑐g_{c}italic_g start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT; and don’t visit gasubscript𝑔𝑎g_{a}italic_g start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT until visiting gbsubscript𝑔𝑏g_{b}italic_g start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT; and if visited gcsubscript𝑔𝑐g_{c}italic_g start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, don’t visit gcsubscript𝑔𝑐g_{c}italic_g start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT again; and if visited gbsubscript𝑔𝑏g_{b}italic_g start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT, don’t visit gbsubscript𝑔𝑏g_{b}italic_g start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT again; and always avoid obstacles; and always avoid collisions; and eventually state at the final goal.” Its mathematical formulation can be found in Appendix -A.-A2.

Figure 7 shows the waypoint-tracking scenario before and after the training of a performance-boosting controller. As described in Section V.V-B, we use a REN with q=r=32𝑞𝑟32q=r=32italic_q = italic_r = 32 for approximating the 2subscript2\mathcal{L}_{2}caligraphic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT operator \mathbfcal{M}roman_ℳ. Furthermore, we allow for a time-varying bias of the form bt=[01×q01×rbw,t]superscriptsubscript𝑏𝑡topmatrixsubscript01𝑞subscript01𝑟superscriptsubscript𝑏𝑤𝑡topb_{t}^{\top}=\begin{bmatrix}0_{1\times q}&0_{1\times r}&b_{w,t}^{\top}\end{bmatrix}italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL 0 start_POSTSUBSCRIPT 1 × italic_q end_POSTSUBSCRIPT end_CELL start_CELL 0 start_POSTSUBSCRIPT 1 × italic_r end_POSTSUBSCRIPT end_CELL start_CELL italic_b start_POSTSUBSCRIPT italic_w , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ], in (38), with bw,t=0subscript𝑏𝑤𝑡0b_{w,t}=0italic_b start_POSTSUBSCRIPT italic_w , italic_t end_POSTSUBSCRIPT = 0 for t>T𝑡𝑇t>Titalic_t > italic_T. While the system always starts at the same initial condition indicated with ‘\circ,’ the data consists of disturbance sequences wT:0subscript𝑤:𝑇0w_{T:0}italic_w start_POSTSUBSCRIPT italic_T : 0 end_POSTSUBSCRIPT with fixed w0subscript𝑤0w_{0}italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and wT:1subscript𝑤:𝑇1w_{T:1}italic_w start_POSTSUBSCRIPT italic_T : 1 end_POSTSUBSCRIPT as i.i.d. samples drawn from a Gaussian distribution with zero mean and standard deviation of 0.010.010.010.01. Our result highlights the power of complex costs — expressed through the TLTL loss function — which promotes vehicles visiting the predefined waypoints in the correct order while avoiding collisions between them and with the obstacles.

Refer to caption
Refer to caption
Refer to caption
Figure 7: Waypoint-tracking — Closed-loop trajectories before training (left) and after training (middle and right). Snapshots taken at time-instants τ𝜏\tauitalic_τ. Colored (gray) lines show the trajectories in [0,τi]0subscript𝜏𝑖[0,\tau_{i}][ 0 , italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ([τi,)subscript𝜏𝑖[\tau_{i},\infty)[ italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ∞ )). Colored balls (and their radius) represent the agents (and their size for collision avoidance).

VII Conclusion

Embedding safety and stability emerges as a crucial challenge when control systems are equipped with high-performance machine learning components. This work aims to contribute to this rapidly develo** field by uncovering the theoretical and computational potential of IMC for safely boosting the performance of closed-loop nonlinear systems with machine learning models such as DNNs.

The results of this work open up several future research directions. First, motivated by the recent results of [49], it would be relevant to apply statistical learning theory to rigorously assess the generalization capabilities of performance-boosting controllers in uncertain environments and over extended timeframes. Second, drawing on insights from [62], integrating extensive RL-based offline learning with real-time adjustments similar to MPC presents a promising approach. Third, within the IMC framework, there is a significant opportunity to develop richer parametrizations of stable dynamical systems in psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, and to theoretically prove their approximation capabilities. Lastly, building upon [63], it is interesting to explore how learning-based IMC methods could generate new optimization algorithms with formal guarantees for tackling complex optimal control and machine learning tasks.

-A Implementation details for the numerical experiments in Section VI

We set m[i]=b1[i]=k1[i]=k2[i]=1superscript𝑚delimited-[]𝑖subscriptsuperscript𝑏delimited-[]𝑖1subscriptsuperscriptsuperscript𝑘delimited-[]𝑖1subscriptsuperscriptsuperscript𝑘delimited-[]𝑖21m^{[i]}=b^{[i]}_{1}={k^{\prime}}^{[i]}_{1}={k^{\prime}}^{[i]}_{2}=1italic_m start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT = italic_b start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 and b2i=0.5subscriptsuperscript𝑏𝑖20.5b^{i}_{2}=0.5italic_b start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.5 as the parameters for each vehicle i𝑖iitalic_i, in the model (39) with the pre-stabilizing controller (40). The collision-avoidance radius of each agent is 0.5.

-A1 Mountains scenario

As shown in Figure 4, the vehicles start at p0[1]=(2,2)subscriptsuperscript𝑝delimited-[]1022p^{[1]}_{0}=(-2,-2)italic_p start_POSTSUPERSCRIPT [ 1 ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( - 2 , - 2 ) and p0[2]=(2,2)subscriptsuperscript𝑝delimited-[]2022p^{[2]}_{0}=(-2,2)italic_p start_POSTSUPERSCRIPT [ 2 ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( - 2 , 2 ), and their goal is to go to the target positions p¯[1]=(2,2)superscript¯𝑝delimited-[]122\bar{p}^{[1]}=(2,2)over¯ start_ARG italic_p end_ARG start_POSTSUPERSCRIPT [ 1 ] end_POSTSUPERSCRIPT = ( 2 , 2 ) and p¯[2]=(2,2)superscript¯𝑝delimited-[]222\bar{p}^{[2]}=(-2,2)over¯ start_ARG italic_p end_ARG start_POSTSUPERSCRIPT [ 2 ] end_POSTSUPERSCRIPT = ( - 2 , 2 ), respectively. The training data consists of 100100100100 initial positions sampled from a Gaussian distribution around the initial position with a standard deviation of 0.50.50.50.5.

Let x¯=(x¯[1],x¯[2])¯𝑥superscript¯𝑥delimited-[]1superscript¯𝑥delimited-[]2\bar{x}=(\bar{x}^{[1]},\bar{x}^{[2]})over¯ start_ARG italic_x end_ARG = ( over¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT [ 1 ] end_POSTSUPERSCRIPT , over¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT [ 2 ] end_POSTSUPERSCRIPT ) with x¯[i]=(p¯[i],02)superscript¯𝑥delimited-[]𝑖superscript¯𝑝delimited-[]𝑖subscript02\bar{x}^{[i]}=(\bar{p}^{[i]},0_{2})over¯ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT = ( over¯ start_ARG italic_p end_ARG start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT , 0 start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). The terms of the cost function (41) are defined as follows:

ltraj(xt,ut)=(xtx¯)Q~(xtx¯)+αuututsubscript𝑙𝑡𝑟𝑎𝑗subscript𝑥𝑡subscript𝑢𝑡superscriptsubscript𝑥𝑡¯𝑥top~𝑄subscript𝑥𝑡¯𝑥subscript𝛼𝑢superscriptsubscript𝑢𝑡topsubscript𝑢𝑡\displaystyle l_{traj}(x_{t},u_{t})=(x_{t}-\bar{x})^{\top}\tilde{Q}(x_{t}-\bar% {x})+\alpha_{u}u_{t}^{\top}u_{t}italic_l start_POSTSUBSCRIPT italic_t italic_r italic_a italic_j end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG italic_Q end_ARG ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG ) + italic_α start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
lca(xt)={αcai=0Nj,ij(dti,j+ϵ)2ifdti,jDsafe,0otherwise,subscript𝑙𝑐𝑎subscript𝑥𝑡casessubscript𝛼𝑐𝑎superscriptsubscript𝑖0𝑁subscript𝑗𝑖𝑗superscriptsubscriptsuperscript𝑑𝑖𝑗𝑡italic-ϵ2ifsubscriptsuperscript𝑑𝑖𝑗𝑡subscript𝐷safe0otherwisel_{ca}(x_{t})=\begin{cases}\alpha_{ca}\sum_{i=0}^{N}\sum_{j,\,i\neq j}(d^{i,j}% _{t}+\epsilon)^{-2}&\text{if}\,d^{i,j}_{t}\leq D_{\text{safe}}\,,\\ 0&\text{otherwise}\,,\end{cases}italic_l start_POSTSUBSCRIPT italic_c italic_a end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = { start_ROW start_CELL italic_α start_POSTSUBSCRIPT italic_c italic_a end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j , italic_i ≠ italic_j end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_ϵ ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT end_CELL start_CELL if italic_d start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ italic_D start_POSTSUBSCRIPT safe end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise , end_CELL end_ROW

where Q~0succeeds~𝑄0\tilde{Q}\succ 0over~ start_ARG italic_Q end_ARG ≻ 0 and αu,αca>0subscript𝛼𝑢subscript𝛼𝑐𝑎0\alpha_{u},\alpha_{ca}>0italic_α start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_c italic_a end_POSTSUBSCRIPT > 0 are hyperparameters, dti,j=|pt[i]pt[j]|20subscriptsuperscript𝑑𝑖𝑗𝑡subscriptsubscriptsuperscript𝑝delimited-[]𝑖𝑡subscriptsuperscript𝑝delimited-[]𝑗𝑡20d^{i,j}_{t}=|p^{[i]}_{t}-p^{[j]}_{t}|_{2}\geq 0italic_d start_POSTSUPERSCRIPT italic_i , italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = | italic_p start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_p start_POSTSUPERSCRIPT [ italic_j ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ 0 denotes the distance between agent i𝑖iitalic_i and j𝑗jitalic_j, ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 is a fixed positive small constant such that the loss remains bounded for all distance values and Dsafesubscript𝐷safeD_{\text{safe}}italic_D start_POSTSUBSCRIPT safe end_POSTSUBSCRIPT is a safe distance between the center of mass of each the agent; we set it to 1.2.

Motivated by [64], we represent the obstacles based on a Gaussian density function

η(z;μ,Σ)=12πdet(Σ)exp(12(zμ)Σ1(zμ)),𝜂𝑧𝜇Σ12𝜋detΣ12superscript𝑧𝜇topsuperscriptΣ1𝑧𝜇\eta(z;\mu,\Sigma)=\frac{1}{2\pi\sqrt{\text{det}(\Sigma)}}\exp\left(-\frac{1}{% 2}\left(z-\mu\right)^{\top}\Sigma^{-1}\left(z-\mu\right)\right)\,,italic_η ( italic_z ; italic_μ , roman_Σ ) = divide start_ARG 1 end_ARG start_ARG 2 italic_π square-root start_ARG det ( roman_Σ ) end_ARG end_ARG roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_z - italic_μ ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_z - italic_μ ) ) ,

with mean μ2𝜇superscript2\mu\in\mathbb{R}^{2}italic_μ ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and covariance Σ2×2Σsuperscript22\Sigma\in\mathbb{R}^{2\times 2}roman_Σ ∈ blackboard_R start_POSTSUPERSCRIPT 2 × 2 end_POSTSUPERSCRIPT with Σ0succeedsΣ0\Sigma\succ 0roman_Σ ≻ 0. The term lobs(xt)subscript𝑙𝑜𝑏𝑠subscript𝑥𝑡l_{obs}(x_{t})italic_l start_POSTSUBSCRIPT italic_o italic_b italic_s end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is given by

lobs(xt)=αobsi=02subscript𝑙𝑜𝑏𝑠subscript𝑥𝑡subscript𝛼𝑜𝑏𝑠superscriptsubscript𝑖02\displaystyle l_{obs}(x_{t})=\alpha_{obs}\sum_{i=0}^{2}italic_l start_POSTSUBSCRIPT italic_o italic_b italic_s end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_α start_POSTSUBSCRIPT italic_o italic_b italic_s end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (η(pt[i];[2.50],0.2I)\displaystyle\Bigg{(}\eta\left(p^{[i]}_{t};\begin{bmatrix}2.5\\ 0\end{bmatrix},0.2\,I\right)( italic_η ( italic_p start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; [ start_ARG start_ROW start_CELL 2.5 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ] , 0.2 italic_I )
+η(pt[i];[2.50],0.2I)𝜂subscriptsuperscript𝑝delimited-[]𝑖𝑡matrix2.500.2𝐼\displaystyle~{}~{}+\eta\left(p^{[i]}_{t};\begin{bmatrix}-2.5\\ 0\end{bmatrix},0.2\,I\right)+ italic_η ( italic_p start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; [ start_ARG start_ROW start_CELL - 2.5 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ] , 0.2 italic_I )
+η(pt[i];[1.50],0.2I)𝜂subscriptsuperscript𝑝delimited-[]𝑖𝑡matrix1.500.2𝐼\displaystyle~{}~{}+\eta\left(p^{[i]}_{t};\begin{bmatrix}1.5\\ 0\end{bmatrix},0.2\,I\right)+ italic_η ( italic_p start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; [ start_ARG start_ROW start_CELL 1.5 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ] , 0.2 italic_I )
+η(pt[i];[1.50],0.2I)).\displaystyle~{}~{}+\eta\left(p^{[i]}_{t};\begin{bmatrix}-1.5\\ 0\end{bmatrix},0.2\,I\right)\Bigg{)}\,.+ italic_η ( italic_p start_POSTSUPERSCRIPT [ italic_i ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; [ start_ARG start_ROW start_CELL - 1.5 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ] , 0.2 italic_I ) ) .

For the hyperparameters, we set αu=2.5×104subscript𝛼𝑢2.5superscript104\alpha_{u}=2.5\times 10^{-4}italic_α start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = 2.5 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, αca=100subscript𝛼𝑐𝑎100\alpha_{ca}=100italic_α start_POSTSUBSCRIPT italic_c italic_a end_POSTSUBSCRIPT = 100, αobs=5×103subscript𝛼𝑜𝑏𝑠5superscript103\alpha_{obs}=5\times 10^{3}italic_α start_POSTSUBSCRIPT italic_o italic_b italic_s end_POSTSUBSCRIPT = 5 × 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT and Q=I4𝑄subscript𝐼4Q=I_{4}italic_Q = italic_I start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT. We use stochastic gradient descent with Adam to minimize the loss function, setting a learning rate of 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. We train for 5×1035superscript1035\times 10^{3}5 × 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT epochs with one trajectory per batch size.

-A2 Waypoint-tracking scenario

As shown in Figure 4, the vehicles start at p0[1]=(2,0)subscriptsuperscript𝑝delimited-[]1020p^{[1]}_{0}=(-2,0)italic_p start_POSTSUPERSCRIPT [ 1 ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( - 2 , 0 ) and p0[2]=(0,0)subscriptsuperscript𝑝delimited-[]2000p^{[2]}_{0}=(0,0)italic_p start_POSTSUPERSCRIPT [ 2 ] end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( 0 , 0 ). The goal points gasubscript𝑔𝑎g_{a}italic_g start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, gbsubscript𝑔𝑏g_{b}italic_g start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT and gcsubscript𝑔𝑐g_{c}italic_g start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT are located at (2,2)22(-2,-2)( - 2 , - 2 ), (0,2)02(0,2)( 0 , 2 ) and (2,2)22(2,-2)( 2 , - 2 ), respectively. To describe the TLTL loss, let us define, for each vehicle, the following functions of time:

  • dtgisubscriptsuperscript𝑑subscript𝑔𝑖𝑡d^{g_{i}}_{t}italic_d start_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, for i=1,2,3𝑖123i=1,2,3italic_i = 1 , 2 , 3, is the distance between the vehicle and the goal point gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT;

  • dtoisubscriptsuperscript𝑑subscript𝑜𝑖𝑡d^{o_{i}}_{t}italic_d start_POSTSUPERSCRIPT italic_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, for i=1,2𝑖12i=1,2italic_i = 1 , 2, is the distance between the vehicle and the ithsuperscript𝑖thi^{\text{th}}italic_i start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT obstacle;

  • dtcollsubscriptsuperscript𝑑𝑐𝑜𝑙𝑙𝑡d^{coll}_{t}italic_d start_POSTSUPERSCRIPT italic_c italic_o italic_l italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the distance between the two vehicles;

where g1subscript𝑔1g_{1}italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, g2subscript𝑔2g_{2}italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and g3subscript𝑔3g_{3}italic_g start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are the waypoints in the correct visiting order, for each vehicle. Following the notation of [60], the temporal logic form of the cost function, for each vehicle, is

(ψg1𝒯ψg2𝒯ψg3)(¬(ψg2ψg3)𝒰ψg1)(¬ψg3𝒰ψg2)(i=1,2,3(ψgi¬ψgi))(i=1,2ψoi)ψcollψg3\left(\psi_{g_{1}}\,\mathcal{T}\,\psi_{g_{2}}\,\mathcal{T}\,\psi_{g_{3}}\right% )\wedge\left(\lnot\left(\psi_{g_{2}}\vee\psi_{g_{3}}\right)\,\mathcal{U}\,\psi% _{g_{1}}\right)\wedge\left(\lnot\psi_{g_{3}}\,\mathcal{U}\,\psi_{g_{2}}\right)% \\ \wedge\left(\bigwedge_{i=1,2,3}\square\left(\psi_{g_{i}}\Rightarrow\bigcirc% \square\lnot\psi_{g_{i}}\right)\right)\wedge\left(\bigwedge_{i=1,2}\square\psi% _{o_{i}}\right)\\ \wedge\square\psi_{coll}\wedge\lozenge\square\psi_{g_{3}}start_ROW start_CELL ( italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_T italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_T italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∧ ( ¬ ( italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∨ italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) caligraphic_U italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∧ ( ¬ italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_U italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL ∧ ( ⋀ start_POSTSUBSCRIPT italic_i = 1 , 2 , 3 end_POSTSUBSCRIPT □ ( italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⇒ ○ □ ¬ italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) ∧ ( ⋀ start_POSTSUBSCRIPT italic_i = 1 , 2 end_POSTSUBSCRIPT □ italic_ψ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL ∧ □ italic_ψ start_POSTSUBSCRIPT italic_c italic_o italic_l italic_l end_POSTSUBSCRIPT ∧ ◆ □ italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW (45)

where ψ𝜓\psiitalic_ψ are predicates defined in Table I, and robs=1.7subscript𝑟𝑜𝑏𝑠1.7r_{obs}=1.7italic_r start_POSTSUBSCRIPT italic_o italic_b italic_s end_POSTSUBSCRIPT = 1.7 and rr=0.5subscript𝑟𝑟0.5r_{r}=0.5italic_r start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = 0.5 are the radii of the obstacles and vehicles, respectively.111111Note that in the waypoint-tracking scenario, we do not model the obstacles with a Gaussian density function. The Boolean operators ¬\lnot¬, \vee, and \wedge stand for negation (not), disjunction (or), and conjunction (and). The temporal operators 𝒯𝒯\mathcal{T}caligraphic_T, 𝒰𝒰\mathcal{U}caligraphic_U, \lozenge, and \square stand for ‘then’, ‘until’, ‘eventually’, and ‘always’. Mathematically, each term can be automatically translated following [60, 61]. For instance, ψcollsubscript𝜓𝑐𝑜𝑙𝑙\square\psi_{coll}□ italic_ψ start_POSTSUBSCRIPT italic_c italic_o italic_l italic_l end_POSTSUBSCRIPT translates into

mint[0,T](dtrob2rrob),subscript𝑡0𝑇subscriptsuperscript𝑑𝑟𝑜𝑏𝑡2subscript𝑟𝑟𝑜𝑏\min_{t\in[0,T]}(d^{rob}_{t}-2r_{rob}),roman_min start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT italic_r italic_o italic_b end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - 2 italic_r start_POSTSUBSCRIPT italic_r italic_o italic_b end_POSTSUBSCRIPT ) ,

and (ψgi¬ψgi)\square\left(\psi_{g_{i}}\Rightarrow\bigcirc\square\lnot\psi_{g_{i}}\right)□ ( italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⇒ ○ □ ¬ italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) translates into

mint[0,T]max((0.05dtgi),mint~[t+1,T](0.05dtgi)).\displaystyle\min_{t\in[0,T]}\max\big{(}\begin{aligned} &-(0.05-d^{g_{i}}_{t})% \,,\,&\min_{\tilde{t}\in[t+1,T]}-(0.05-d^{g_{i}}_{t})\big{)}.\end{aligned}roman_min start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT roman_max ( start_ROW start_CELL end_CELL start_CELL - ( 0.05 - italic_d start_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , end_CELL start_CELL roman_min start_POSTSUBSCRIPT over~ start_ARG italic_t end_ARG ∈ [ italic_t + 1 , italic_T ] end_POSTSUBSCRIPT - ( 0.05 - italic_d start_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) . end_CELL end_ROW

The full mathematical expression of (45), which can be obtained following [60], is implemented in our Github repository.

Predicates Expression
ψg1subscript𝜓subscript𝑔1\psi_{g_{1}}italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT dg1<0.05superscript𝑑subscript𝑔10.05d^{g_{1}}<0.05italic_d start_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT < 0.05
ψg2subscript𝜓subscript𝑔2\psi_{g_{2}}italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT dg2<0.05superscript𝑑subscript𝑔20.05d^{g_{2}}<0.05italic_d start_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT < 0.05
ψg3subscript𝜓subscript𝑔3\psi_{g_{3}}italic_ψ start_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT dg3<0.05superscript𝑑subscript𝑔30.05d^{g_{3}}<0.05italic_d start_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT < 0.05
ψo1subscript𝜓subscript𝑜1\psi_{o_{1}}italic_ψ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT do1>robssuperscript𝑑subscript𝑜1subscript𝑟𝑜𝑏𝑠d^{o_{1}}>r_{obs}italic_d start_POSTSUPERSCRIPT italic_o start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT > italic_r start_POSTSUBSCRIPT italic_o italic_b italic_s end_POSTSUBSCRIPT
ψo2subscript𝜓subscript𝑜2\psi_{o_{2}}italic_ψ start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT do2>robssuperscript𝑑subscript𝑜2subscript𝑟𝑜𝑏𝑠d^{o_{2}}>r_{obs}italic_d start_POSTSUPERSCRIPT italic_o start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT > italic_r start_POSTSUBSCRIPT italic_o italic_b italic_s end_POSTSUBSCRIPT
ψcollsubscript𝜓𝑐𝑜𝑙𝑙\psi_{coll}italic_ψ start_POSTSUBSCRIPT italic_c italic_o italic_l italic_l end_POSTSUBSCRIPT drob>2rrobsuperscript𝑑𝑟𝑜𝑏2subscript𝑟𝑟𝑜𝑏d^{rob}>2\,r_{rob}italic_d start_POSTSUPERSCRIPT italic_r italic_o italic_b end_POSTSUPERSCRIPT > 2 italic_r start_POSTSUBSCRIPT italic_r italic_o italic_b end_POSTSUBSCRIPT
TABLE I: Predicates used in the TLTL formulation of (45).

We also add a small regularization term for promoting that the vehicles stay close to the end target point, which reads αregxtx¯2subscript𝛼regsuperscriptdelimited-∥∥subscript𝑥𝑡¯𝑥2\alpha_{\text{reg}}\left\lVert x_{t}-\bar{x}\right\rVert^{2}italic_α start_POSTSUBSCRIPT reg end_POSTSUBSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, with αreg=1×104subscript𝛼reg1superscript104\alpha_{\text{reg}}=1\times 10^{-4}italic_α start_POSTSUBSCRIPT reg end_POSTSUBSCRIPT = 1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. We use stochastic gradient descent with Adam to minimize the loss function, setting a learning rate of 5×1045superscript1045\times 10^{-4}5 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. We train for 3000 epochs with a single trajectory per batch size.

References

  • [1] A. M. Annaswamy, K. H. Johansson, and G. J. Pappas, “Control for societal-scale challenges: Road map 2030,” IEEE Control Systems Society Publication, 2023.
  • [2] S. Sastry, Nonlinear systems: analysis, stability, and control.   Springer Science & Business Media, 2013, vol. 10.
  • [3] D. P. Bertsekas, “Dynamic programming and optimal control: Vol. I-II,” Belmont, MA: Athena Scientific, 2011.
  • [4] L. S. Pontryagin, Mathematical theory of optimal processes.   Routledge, 2018.
  • [5] J. B. Rawlings, D. Q. Mayne, and M. Diehl, Model predictive control: theory, computation, and design.   Nob Hill Publishing Madison, WI, 2017, vol. 2.
  • [6] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.   MIT press, 2018.
  • [7] L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, pp. 411–444, 2022.
  • [8] J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science robotics, vol. 5, no. 47, p. eabc5986, 2020.
  • [9] Y. Song, M. Steinweg, E. Kaufmann, and D. Scaramuzza, “Autonomous drone racing with deep reinforcement learning,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 1205–1212.
  • [10] E. Kaufmann, L. Bauersfeld, A. Loquercio, M. Müller, V. Koltun, and D. Scaramuzza, “Champion-level drone racing using deep reinforcement learning,” Nature, vol. 620, no. 7976, pp. 982–987, 2023.
  • [11] F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” Advances in Neural Information Processing Systems 30, vol. 2, pp. 909–919, 2018.
  • [12] M. Zanon and S. Gros, “Safe reinforcement learning using robust MPC,” IEEE Transactions on Automatic Control, vol. 66, no. 8, pp. 3638–3652, 2020.
  • [13] M. ** and J. Lavaei, “Stability-certified reinforcement learning: A control-theoretic perspective,” IEEE Access, vol. 8, pp. 229 086–229 100, 2020.
  • [14] T. Parisini and R. Zoppoli, “A receding-horizon regulator for nonlinear systems and a neural approximation,” Automatica, vol. 31, no. 10, pp. 1443–1451, Oct. 1995.
  • [15] T. Parisini, M. Sanguineti, and R. Zoppoli, “Nonlinear stabilization by receding-horizon neural regulators,” International Journal of Control, vol. 70, no. 3, pp. 341–362, Jan. 1998.
  • [16] A. Levin and K. Narendra, “Control of nonlinear dynamical systems using neural networks. II. Observability, identification, and control,” IEEE Transactions on Neural Networks, vol. 7, no. 1, pp. 30–42, Jan. 1996.
  • [17] F. Gu, H. Yin, L. El Ghaoui, M. Arcak, P. Seiler, and M. **, “Recurrent neural network controllers synthesis with stability guarantees for partially observed systems,” in AAAI, 2022, pp. 5385–5394.
  • [18] P. Pauli, J. Köhler, J. Berberich, A. Koch, and F. Allgöwer, “Offset-free setpoint tracking using neural network controllers,” in Learning for Dynamics and Control.   PMLR, 2021, pp. 992–1003.
  • [19] R. Wang, N. H. Barbara, M. Revay, and I. Manchester, “Learning over all stabilizing nonlinear controllers for a partially-observed linear system,” IEEE Control Systems Letters, vol. 7, pp. 91–96, 2022.
  • [20] R. Wang and I. R. Manchester, “Youla-REN: Learning nonlinear feedback policies with robust stability guarantees,” in 2022 American Control Conference (ACC).   IEEE, 2022, pp. 2116–2123.
  • [21] L. Furieri, C. L. Galimberti, M. Zakwan, and G. Ferrari-Trecate, “Distributed neural network control with dependability guarantees: a compositional port-Hamiltonian approach,” in Learning for Dynamics and Control Conference.   PMLR, 2022, pp. 571–583.
  • [22] L. Furieri, C. L. Galimberti, and G. Ferrari-Trecate, “Neural system level synthesis: Learning over all stabilizing policies for nonlinear systems,” in 2022 IEEE 61st Conference on Decision and Control (CDC).   IEEE, 2022, pp. 2765–2770.
  • [23] N. H. Barbara, R. Wang, and I. R. Manchester, “Learning over contracting and Lipschitz closed-loops for partially-observed nonlinear systems,” in 2023 62nd IEEE Conference on Decision and Control (CDC).   IEEE, 2023, pp. 1028–1033.
  • [24] C. E. Garcia and M. Morari, “Internal model control. a unifying review and some new results,” Industrial & Engineering Chemistry Process Design and Development, vol. 21, no. 2, pp. 308–323, 1982.
  • [25] C. G. Economou, M. Morari, and B. O. Palsson, “Internal model control: Extension to nonlinear system,” Industrial & Engineering Chemistry Process Design and Development, vol. 25, no. 2, pp. 403–411, 1986.
  • [26] F. Bonassi and R. Scattolini, “Recurrent neural network-based internal model control design for stable nonlinear systems,” European Journal of Control, vol. 65, p. 100632, 2022.
  • [27] V. Anantharam and C. A. Desoer, “On the stabilization of nonlinear systems,” IEEE Transactions on Automatic Control, vol. 29, no. 6, pp. 569–572, 1984.
  • [28] K. Fujimoto and T. Sugie, “State-space characterization of Youla parametrization for nonlinear systems based on input-to-state stability,” in Proceedings of the 37th IEEE Conference on Decision and Control, vol. 3.   IEEE, 1998, pp. 2479–2484.
  • [29] D. Ho, “A system level approach to discrete-time nonlinear systems,” in 2020 American Control Conference (ACC).   IEEE, 2020, pp. 1625–1630.
  • [30] K.-K. K. Kim, E. R. Patrón, and R. D. Braatz, “Standard representation and unified stability analysis for dynamic artificial neural network models,” Neural Networks, vol. 98, pp. 251–262, 2018.
  • [31] M. Revay, R. Wang, and I. R. Manchester, “Recurrent equilibrium networks: Flexible dynamic models with guaranteed stability and robustness,” IEEE Transactions on Automatic Control, 2023.
  • [32] Y. Tang, Y. Zheng, and N. Li, “Analysis of the optimization landscape of linear quadratic Gaussian (LQG) control,” in Learning for Dynamics and Control.   PMLR, 2021, pp. 599–610.
  • [33] L. Furieri and M. Kamgarpour, “First order methods for globally optimal distributed controllers beyond quadratic invariance,” in 2020 American Control Conference (ACC).   IEEE, 2020, pp. 4588–4593.
  • [34] D. E. Rivera, M. Morari, and S. Skogestad, “Internal model control: Pid controller design,” Industrial & engineering chemistry process design and development, vol. 25, no. 1, pp. 252–265, 1986.
  • [35] K. Zhou and J. C. Doyle, Essentials of robust control.   Prentice hall Upper Saddle River, NJ, 1998, vol. 104.
  • [36] M. W. Fisher, G. Hug, and F. Dörfler, “Approximation by simple poles–part I: Density and geometric convergence rate in hardy space,” IEEE Transactions on Automatic Control, 2023.
  • [37] ——, “Approximation by simple poles–part II: System level synthesis beyond finite impulse response,” arXiv preprint arXiv:2203.16765, 2022.
  • [38] L. Furieri, Y. Zheng, A. Papachristodoulou, and M. Kamgarpour, “Sparsity invariance for convex design of distributed controllers,” IEEE Transactions on Control of Network Systems, vol. 7, no. 4, pp. 1836–1847, 2020.
  • [39] R. Wang, N. H. Barbara, M. Revay, and I. R. Manchester, “Learning over all stabilizing nonlinear controllers for a partially-observed linear system,” IEEE Control Systems Letters, vol. 7, pp. 91–96, 2022.
  • [40] Y.-S. Wang, N. Matni, and J. C. Doyle, “A system-level approach to controller synthesis,” IEEE Transactions on Automatic Control, vol. 64, no. 10, pp. 4079–4093, 2019.
  • [41] L. Conger, J. S. L. Li, E. Mazumdar, and S. L. Brunton, “Nonlinear system level synthesis for polynomial dynamical systems,” in 2022 IEEE 61st Conference on Decision and Control (CDC).   IEEE, 2022, pp. 3846–3852.
  • [42] G. Zames, “On the input-output stability of time-varying nonlinear feedback systems part one: Conditions derived using concepts of loop gain, conicity, and positivity,” IEEE transactions on automatic control, vol. 11, no. 2, pp. 228–238, 1966.
  • [43] L. Massai, D. Saccani, L. Furieri, and G. Ferrari-Trecate, “Unconstrained learning of networked nonlinear systems via free parametrization of stable interconnected operators,” arXiv preprint arXiv:2311.13967, 2023.
  • [44] D. Saccani, L. Massai, L. Furieri, and G. Ferrari-Trecate, “Optimal distributed control with stability guarantees by training a network of neural closed-loop maps,” arXiv preprint arXiv:2404.02820, 2024.
  • [45] D. Youla, H. Jabr, and J. Bongiorno, “Modern Wiener-Hopf design of optimal controllers–Part II: The multivariable case,” IEEE Transactions on Automatic Control, vol. 21, no. 3, pp. 319–338, 1976.
  • [46] L. Furieri, Y. Zheng, A. Papachristodoulou, and M. Kamgarpour, “An input–output parametrization of stabilizing controllers: amidst Youla and system level synthesis,” IEEE Control Systems Letters, 2019.
  • [47] Y. Zheng, L. Furieri, M. Kamgarpour, and N. Li, “System-level, input–output and new parameterizations of stabilizing controllers, and their numerical computation,” Automatica, vol. 140, p. 110211, 2022.
  • [48] M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in International Conference on Machine Learning.   PMLR, 2018, pp. 1467–1476.
  • [49] M. G. Boroujeni, C. L. Galimberti, A. Krause, and G. Ferrari-Trecate, “A pac-bayesian framework for optimal control with stability guarantees,” arXiv preprint arXiv:2403.17790, 2024.
  • [50] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015. [Online]. Available: https://www.tensorflow.org/
  • [51] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32.   Curran Associates, Inc., 2019, pp. 8024–8035.
  • [52] D. Martinelli, C. L. Galimberti, I. R. Manchester, L. Furieri, and G. Ferrari-Trecate, “Unconstrained parametrization of dissipative and contracting neural ordinary differential equations,” in 2023 62nd IEEE Conference on Decision and Control (CDC).   IEEE, 2023, pp. 3043–3048.
  • [53] S. Bai, J. Z. Kolter, and V. Koltun, “Deep equilibrium models,” in Advances in Neural Information Processing Systems (NeurIPS), 2019.
  • [54] M. Zakwan and G. Ferrari-Trecate, “Neural distributed controllers with port-Hamiltonian structures,” arXiv preprint arXiv:2403.17785, 2024.
  • [55] L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger, “Learning-based model predictive control: Toward safe learning in control,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 3, pp. 269–296, 2020.
  • [56] K. P. Wabersich and M. N. Zeilinger, “A predictive safety filter for learning-based control of constrained nonlinear dynamical systems,” Automatica, vol. 129, p. 109597, 2021.
  • [57] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” in 2019 18th European control conference (ECC).   IEEE, 2019, pp. 3420–3431.
  • [58] A. Agrawal and K. Sreenath, “Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation.” in Robotics: Science and Systems, vol. 13.   Cambridge, MA, USA, 2017, pp. 1–10.
  • [59] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. Scokaert, “Constrained model predictive control: Stability and optimality,” Automatica, vol. 36, no. 6, pp. 789–814, 2000.
  • [60] X. Li, C.-I. Vasile, and C. Belta, “Reinforcement learning with temporal logic rewards,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2017, pp. 3834–3839.
  • [61] K. Leung, N. Aréchiga, and M. Pavone, “Backpropagation through signal temporal logic specifications: Infusing logical structure into gradient-based methods,” The International Journal of Robotics Research, vol. 42, no. 6, pp. 356–370, 2023.
  • [62] D. Bertsekas, Lessons from AlphaZero for optimal, model predictive, and adaptive control.   Athena Scientific, 2022.
  • [63] A. Martin and L. Furieri, “Learning to optimize with convergence guarantees using nonlinear system theory,” arXiv preprint arXiv:2403.09389, 2024.
  • [64] D. Onken, L. Nurbekyan, X. Li, S. W. Fung, S. Osher, and L. Ruthotto, “A neural network approach applied to multi-agent optimal control,” in IEEE European Control Conference (ECC), 2021, pp. 1036–1041.