License: arXiv.org perpetual non-exclusive license
arXiv:2210.08387v3 [math.OC] 09 Jan 2024

Time -Varying Semidefinite Programming:
Path Following a Burer–Monteiro Factorization

Antonio Bellon Faculty of Electrical Engineering, Czech Technical University in Prague, Karlovo Namesti 13, Prague 121 35, the Czech Republic    Mareike Dressler School of Mathematics and Statistics, University of New South Wales, Sydney, NSW 2052, Australia    Vyacheslav Kungurtsev*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT    Jakub Mareček*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT    André Uschmajew Institute of Mathematics & Centre for Advanced Analytics and Predictive Sciences, University of Augsburg, 86159 Augsburg, Germany
Abstract

We present an online algorithm for time-varying semidefinite programs (TV-SDPs), based on the tracking of the solution trajectory of a low-rank matrix factorization, also known as the Burer–Monteiro factorization, in a path-following procedure. There, a predictor-corrector algorithm solves a sequence of linearized systems. This requires the introduction of a horizontal space constraint to ensure the local injectivity of the low-rank factorization. The method produces a sequence of approximate solutions for the original TV-SDP problem, for which we show that they stay close to the optimal solution path if properly initialized. Numerical experiments for a time-varying max-cut SDP relaxation demonstrate the computational advantages of the proposed method for tracking TV-SDPs in terms of runtime compared to off-the-shelf interior point methods.

Key words. Semidefinite programming; nonlinear programming; parametric optimization;
time-varying constrained optimization; Newton type methods

MSC codes. 49M15, 90C22, 90C30, 90C31

1 Introduction

Semidefinite programs (SDPs) constitute an important class of convex constrained optimization problems that is ubiquitous in statistics, signal processing, control systems, and other areas. In several applications, the data of the problem vary over time, so that this can be modeled as a time-varying SDP (TV-SDP). In this paper we consider TV-SDPs of the form

minX𝕊nsubscript𝑋superscript𝕊𝑛\displaystyle\min_{X\in\mathbb{S}^{n}}roman_min start_POSTSUBSCRIPT italic_X ∈ blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT Ct,Xsubscript𝐶𝑡𝑋\displaystyle\langle C_{t},X\rangle⟨ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X ⟩ (SDPt𝑡titalic_t)
  s.t. 𝒜t(X)=bt,subscript𝒜𝑡𝑋subscript𝑏𝑡\displaystyle\mathcal{A}_{t}(X)=b_{t},caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_X ) = italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,
X0,succeeds-or-equals𝑋0\displaystyle X\succeq 0,italic_X ⪰ 0 ,

where t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ] is a time parameter varying on a bounded interval. Here 𝕊nsuperscript𝕊𝑛\mathbb{S}^{n}blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT denotes the space of real symmetric n×n𝑛𝑛n\times nitalic_n × italic_n matrices, 𝒜t:𝕊nm:subscript𝒜𝑡superscript𝕊𝑛superscript𝑚\mathcal{A}_{t}\colon\mathbb{S}^{n}\to\mathbb{R}^{m}caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT is a linear operator defined by 𝒜t(X)=(A1,t,X,,Am,t,X)subscript𝒜𝑡𝑋subscript𝐴1𝑡𝑋subscript𝐴𝑚𝑡𝑋\mathcal{A}_{t}(X)=(\langle A_{1,t},X\rangle,\dots,\langle A_{m,t},X\rangle)caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_X ) = ( ⟨ italic_A start_POSTSUBSCRIPT 1 , italic_t end_POSTSUBSCRIPT , italic_X ⟩ , … , ⟨ italic_A start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT , italic_X ⟩ ) for some A1,t,,Am,t𝕊nsubscript𝐴1𝑡subscript𝐴𝑚𝑡superscript𝕊𝑛A_{1,t},\dots,A_{m,t}\in\mathbb{S}^{n}italic_A start_POSTSUBSCRIPT 1 , italic_t end_POSTSUBSCRIPT , … , italic_A start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT ∈ blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, btmsubscript𝑏𝑡superscript𝑚b_{t}\in\mathbb{R}^{m}italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, and Ct𝕊nsubscript𝐶𝑡superscript𝕊𝑛C_{t}\in\mathbb{S}^{n}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Throughout the paper ,\langle\cdot,\cdot\rangle⟨ ⋅ , ⋅ ⟩ denotes the Frobenius inner product and the constraint X0succeeds-or-equals𝑋0X\succeq 0italic_X ⪰ 0 requires X𝕊n𝑋superscript𝕊𝑛X\in\mathbb{S}^{n}italic_X ∈ blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT to be positive semidefinite. In this time-varying setting, one looks for a solution curve tXtmaps-to𝑡subscript𝑋𝑡t\mapsto X_{t}italic_t ↦ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in 𝕊nsuperscript𝕊𝑛\mathbb{S}^{n}blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT such that X=Xt𝑋subscript𝑋𝑡X=X_{t}italic_X = italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is an optimal solution for (SDPt𝑡titalic_t) at each time point t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ].

Time-dependent problems leading to TV-SDPs occur in various applications, such as optimal power flow problems in power systems [38], state estimation problems in quantum systems [1], modeling of energy economic problems [21], job-shop scheduling problems [7], as well as problems arising in signal processing, queueing theory [41], or aircraft engineering [52]. TV-SDPs can be seen as a generalization of continuous linear programming problems, which were first studied by Bellman [11] in relation to so-called bottleneck problems in multistage linear production economic processes. Since then, a large body of literature has been devoted to studying continuous linear programs with and without additional assumptions. However, the generalization of this idea to other classes of optimization problems has only recently been considered. In [55], Wang, Zhang, and Yao study continuous conic programs, and finally Ahmadi and Khadir [3] consider time-varying SDPs. In contrast to our setting, they require the data to vary polynomially with time and also restrict themselves to polynomial solutions. Moreover, the problems studied there involve kernel terms and more complicated constraints, while our work addresses TV-SDPs in a simpler sense of univariate parametric SDPs, following the literature thread of [26, 30].

A naive approach to solve the time-varying problem (SDPt𝑡titalic_t) is to consider, at a sequence of times {tk}k{1,,K}[0,T]subscriptsubscript𝑡𝑘𝑘1𝐾0𝑇\{t_{k}\}_{k\in\{1,\dots,K\}}\subseteq[0,T]{ italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k ∈ { 1 , … , italic_K } end_POSTSUBSCRIPT ⊆ [ 0 , italic_T ], the instances of the problem (SDPtksubscript𝑡𝑘{}_{t_{k}}start_FLOATSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_FLOATSUBSCRIPT) for k{1,,K}𝑘1𝐾k\in\{1,\dots,K\}italic_k ∈ { 1 , … , italic_K } and solve them one after another. The best solvers for SDPs are interior point methods [34, 53, 6, 24, 35], which can solve them in a time that is polynomial in the input size. However, these solvers do not scale particularly well, and thus this brute-force approach may fail in applications where the volume and velocity of the data are large. Furthermore, such a straightforward method would not make use of the local information collected by solving the previous instances of the problem. Even if one considers warm starts [27, 28, 23, 20, 51], the reduction in run time is likely to be marginal. For instance, [51, sections 5.5 and 5.6] reports a 30–60% reduction of the runtime on a collection of time-varying instances of their own choice.

Instead, in this work, we would like to utilize the idea of so-called path-following predictor-corrector algorithms as developed in [29, 5]. In classical predictor-corrector methods, a predictor step for approximating the directional derivative of the solution with respect to a small change in the time parameter is applied, together with a correction step that moves from the current approximate solution closer to the next solution at the new time point. The latter is based on a Newton step for solving the first-order optimality KKT conditions.

A limiting factor in solving both stationary and time-dependent SDPs is computational complexity when n𝑛nitalic_n is large. A common solution to this obstacle is the Burer–Monteiro approach, as presented in the seminal work [16, 17]. In this approach, a low-rank factorization X=YYT𝑋𝑌superscript𝑌𝑇X=YY^{T}italic_X = italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT of the solution is assumed with Yn×r𝑌superscript𝑛𝑟Y\in\mathbb{R}^{n\times r}italic_Y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT and r𝑟ritalic_r potentially much smaller than n𝑛nitalic_n. In the optimization literature, the Burer–Monteiro method has been very well studied as a nonconvex optimization problem, e.g., in terms of algorithms [36], quality of the optimal value [9, 47], and (global) recovery guarantees [14, 15, 18].

In a time-varying setting, the Burer–Monteiro factorization leads to

minYn×rsubscript𝑌superscript𝑛𝑟\displaystyle\min_{Y\in\mathbb{R}^{n\times r}}roman_min start_POSTSUBSCRIPT italic_Y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT Ct,YYTsubscript𝐶𝑡𝑌superscript𝑌𝑇\displaystyle\langle C_{t},YY^{T}\rangle⟨ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⟩ (BMt𝑡titalic_t)
   s.t. 𝒜t(YYT)=bt,subscript𝒜𝑡𝑌superscript𝑌𝑇subscript𝑏𝑡\displaystyle\mathcal{A}_{t}(YY^{T})=b_{t},caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) = italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

which for every fixed t𝑡titalic_t is a quadratically constrained quadratic problem. A solution then is a curve tYtmaps-to𝑡subscript𝑌𝑡t\mapsto Y_{t}italic_t ↦ italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in n×rsuperscript𝑛𝑟\mathbb{R}^{n\times r}blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT, which, depending on r𝑟ritalic_r, is a space of much smaller dimension than 𝕊nsuperscript𝕊𝑛\mathbb{S}^{n}blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. However, this comes at the price that the problem (BMt𝑡titalic_t) is now nonconvex. Moreover, theoretically it may happen that local optimization methods converge to a critical point that is not globally optimal [54], although in practice the method usually shows very good performance [16, 36, 49].

The aim of this work is to combine the Burer–Monteiro factorization with path-following predictor-corrector methods and to develop a practical algorithm for approximating the solution of (BMt𝑡titalic_t), and consequently of (SDPt𝑡titalic_t), over time. As we explain in section 3, to apply such methods, we need to address the issue that the solutions of (BMt𝑡titalic_t) are never isolated, due to the nonuniqueness of the Burer–Monteiro factorization caused by orthogonal invariance. In this paper, we apply a well-known technique to handle this problem by restricting the solutions to a so-called horizontal space at every time step. From a geometric perspective, such an approach exploits the fact that equivalent factorizations can be identified as the same element in the corresponding quotient manifold with respect to the orthogonal group action [39].

The paper is structured as follows. In section 2 we review important foundations from the SDP literature and state the main assumptions we make on the TV-SDP problem (SDPt𝑡titalic_t). Section 3 presents the underlying quotient geometry of positive semidefinite rank-r𝑟ritalic_r matrices from a linear algebra perspective, focusing in particular on the notion of horizontal space and the domain of injectivity of the map YYYTmaps-to𝑌𝑌superscript𝑌𝑇Y\mapsto YY^{T}italic_Y ↦ italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. We then describe in section 4 our path-following predictor-corrector algorithm, which is based on iteratively solving the linearized KKT system for (BMt𝑡titalic_t) over time. A main result is the rigorous error analysis for this algorithm presented in subsection 4.3. In section 5, we showcase numerical results that test our method on a time-varying variant of the well-known Goemans–Williamson SDP relaxation for the Max-Cut problem in combinatorial optimization and graph theory. We conclude in section 6 with a brief discussion of our results.

2 Preliminaries and key assumptions

Naturally, the rigorous formulation of path-following algorithms requires regularity assumptions on the solution curve. In our context, this will require both assumptions on the original TV-SDP problem (SDPt𝑡titalic_t) as well as on its reformulation (BMt𝑡titalic_t). In particular for the latter, the correct choice of the dimension r𝑟ritalic_r is crucial. In what follows, we present and discuss these assumptions in detail.

First, we briefly review some standard notions and properties for primal-dual SDP pairs; see [4, 56]. Consider the conic dual problem of (SDPt𝑡titalic_t):

maxwmsubscript𝑤superscript𝑚\displaystyle\max_{w\in\mathbb{R}^{m}}roman_max start_POSTSUBSCRIPT italic_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT bt,wsubscript𝑏𝑡𝑤\displaystyle\langle b_{t},w\rangle⟨ italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_w ⟩ (D-SDPt𝑡titalic_t)
  s.t. Z(w)Ct𝒜t*(w)0𝑍𝑤subscript𝐶𝑡subscriptsuperscript𝒜𝑡𝑤succeeds-or-equals0\displaystyle Z(w)\coloneqq C_{t}-\mathcal{A}^{*}_{t}(w)\succeq 0italic_Z ( italic_w ) ≔ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_w ) ⪰ 0

where 𝒜t*:wi=1mwiAi,t:superscriptsubscript𝒜𝑡maps-to𝑤superscriptsubscript𝑖1𝑚subscript𝑤𝑖subscript𝐴𝑖𝑡\mathcal{A}_{t}^{*}\colon w\mapsto\sum_{i=1}^{m}w_{i}A_{i,t}caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT : italic_w ↦ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i , italic_t end_POSTSUBSCRIPT is the linear operator adjoint to 𝒜tsubscript𝒜𝑡\mathcal{A}_{t}caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. For convenience, we often drop the explicit dependence on w𝑤witalic_w and refer to a solution of (D-SDPt𝑡titalic_t) simply as Z𝑍Zitalic_Z. While reviewing the basic properties of SDPs, we assume the time parameter to be fixed and hence omit the subindex t𝑡titalic_t.

The KKT conditions for the pair of primal-dual convex problems (SDPt𝑡titalic_t)-(D-SDPt𝑡titalic_t) read

𝒜(X)𝒜𝑋\displaystyle\mathcal{A}(X)caligraphic_A ( italic_X ) =b,absent𝑏\displaystyle=b,= italic_b , X0,succeeds-or-equals𝑋0\displaystyle X\succeq 0,italic_X ⪰ 0 , (2.1)
Z+𝒜*(w)𝑍superscript𝒜𝑤\displaystyle Z+\mathcal{A}^{*}(w)italic_Z + caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_w ) =C,absent𝐶\displaystyle=C,= italic_C , Z0,succeeds-or-equals𝑍0\displaystyle Z\succeq 0,italic_Z ⪰ 0 ,
XZ𝑋𝑍\displaystyle XZitalic_X italic_Z =0.absent0\displaystyle=0.= 0 .

These are sufficient conditions for the optimality of the pair (X,Z)𝑋𝑍(X,Z)( italic_X , italic_Z ).

Definition 2.1 (strict feasibility).

We say that strict feasibility holds for an instance of primal SDP if there exists a positive definite matrix X0succeeds𝑋0X\succ 0italic_X ≻ 0 that satisfies 𝒜(X)=b𝒜𝑋𝑏\mathcal{A}(X)=bcaligraphic_A ( italic_X ) = italic_b. Similarly, strict feasibility holds for the dual if there exist a vector wm𝑤superscript𝑚w\in\mathbb{R}^{m}italic_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT satisfying Z(w)0succeeds𝑍𝑤0Z(w)\succ 0italic_Z ( italic_w ) ≻ 0.

It is well-known that under strict feasibility the KKT conditions are also necessary for optimality. Note that, in general, a pair (X,Z)𝑋𝑍(X,Z)( italic_X , italic_Z ) of optimal solutions satisfies the inclusions imXkerZim𝑋ker𝑍\operatorname{im}X\subseteq\operatorname{ker}Zroman_im italic_X ⊆ roman_ker italic_Z and imZkerXim𝑍ker𝑋\operatorname{im}Z\subseteq\operatorname{ker}Xroman_im italic_Z ⊆ roman_ker italic_X, where ``im"``im"``\operatorname{im}"` ` roman_im " and ``ker"``ker"``\operatorname{ker}"` ` roman_ker " denote the image and kernel, respectively.

Definition 2.2 (strict complementarity).

A primal-dual optimal point (X,Z)𝑋𝑍(X,Z)( italic_X , italic_Z ) is said to be strictly complementary if imX=kerZim𝑋ker𝑍\operatorname{im}X=\operatorname{ker}Zroman_im italic_X = roman_ker italic_Z (or, equivalently, imZ=kerXim𝑍ker𝑋\operatorname{im}Z=\operatorname{ker}Xroman_im italic_Z = roman_ker italic_X). A primal-dual pair of an instance of SDP satisfies strict complementarity if there exists a strictly complementary primal-dual optimal point (X,Z)𝑋𝑍(X,Z)( italic_X , italic_Z ).

Definition 2.3 (nondegeneracy).

A primal feasible point X𝑋Xitalic_X is primal nondegenerate if

ker𝒜+𝒯X=𝕊n,kernel𝒜subscript𝒯𝑋superscript𝕊𝑛\ker\mathcal{A}+\mathcal{T}_{X}=\mathbb{S}^{n},roman_ker caligraphic_A + caligraphic_T start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT = blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , (2.2)

with 𝒯Xsubscript𝒯𝑋\mathcal{T}_{X}caligraphic_T start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT being the tangent space to the manifold rsubscript𝑟\mathcal{M}_{r}caligraphic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT of fixed rank-r𝑟ritalic_r symmetric matrices at X𝑋Xitalic_X, where r=rankX𝑟rank𝑋r=\operatorname{rank}Xitalic_r = roman_rank italic_X. Let X=YYT𝑋𝑌superscript𝑌𝑇X=YY^{T}italic_X = italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT be a rank-revealing decomposition, then

𝒯X={YVT+VYT:Vn×r}.subscript𝒯𝑋conditional-set𝑌superscript𝑉𝑇𝑉superscript𝑌𝑇𝑉superscript𝑛𝑟\mathcal{T}_{X}=\{YV^{T}+VY^{T}\colon V\in\mathbb{R}^{n\times r}\}.caligraphic_T start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT = { italic_Y italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_V italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT : italic_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT } .

A dual feasible point Z𝑍Zitalic_Z is dual nondegenerate if

im𝒜*+𝒯Z=𝕊n,imsuperscript𝒜subscript𝒯𝑍superscript𝕊𝑛\operatorname{im}\mathcal{A}^{*}+\mathcal{T}_{Z}=\mathbb{S}^{n},roman_im caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + caligraphic_T start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT = blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ,

where 𝒯Zsubscript𝒯𝑍\mathcal{T}_{Z}caligraphic_T start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT is the tangent space at Z𝑍Zitalic_Z to the manifold of fixed rank-s𝑠sitalic_s symmetric matrices with s=rankZ𝑠rank𝑍s=\operatorname{rank}Zitalic_s = roman_rank italic_Z.

Primal-dual strict feasibility implies the existence of both a primal and a dual optimal solution with a zero duality gap. In addition, primal (dual) nondegeneracy implies dual (primal) uniqueness of the solutions. Under strict complementarity, the converse is also true, that is, primal (dual) uniqueness of the primal dual optimal solutions implies dual (primal) nondegeneracy of these solutions. Moreover, primal-dual nondegeneracy and strict complementarity hold generically. We refer to [4] for details.

With regard to the time-varying case, these facts can be generalized as follows.

Theorem 2.4.

(Bellon et al., [12, Theorem 2.19]) Let (Pt𝑡titalic_t,Dt𝑡titalic_t) be a primal-dual pair of TV-SDPs parametrized over a time interval [0,T]0𝑇[0,T][ 0 , italic_T ] such that primal-dual strict feasibility holds for any t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ] and assume that the data 𝒜t,bt,Ctsubscript𝒜𝑡subscript𝑏𝑡subscript𝐶𝑡\mathcal{A}_{t},b_{t},C_{t}caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are continuously differentiable functions of t𝑡titalic_t. Let t*[0,T]superscript𝑡0𝑇t^{*}\in[0,T]italic_t start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ [ 0 , italic_T ] be a fixed value of the time parameter and suppose that (X*,Z*)superscript𝑋superscript𝑍(X^{*},Z^{*})( italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_Z start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) is a nondegenerate optimal and strictly complementary point for (Pt*superscript𝑡t^{*}italic_t start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT,Dt*superscript𝑡t^{*}italic_t start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT). Then there exists ε>0𝜀0\varepsilon>0italic_ε > 0 and a continuously differentiable unique map** t(Xt,Zt)maps-to𝑡subscript𝑋𝑡subscript𝑍𝑡t\mapsto(X_{t},Z_{t})italic_t ↦ ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) defined on (t*ε,t*+ε)superscript𝑡𝜀superscript𝑡𝜀(t^{*}-\varepsilon,t^{*}+\varepsilon)( italic_t start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_ε , italic_t start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + italic_ε ) such that (Xt,Zt)subscript𝑋𝑡subscript𝑍𝑡(X_{t},Z_{t})( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is a unique and strictly complementary primal-dual optimal point to (Pt𝑡titalic_t,Dt𝑡titalic_t) for all t(t*ε,t*+ε)𝑡superscript𝑡𝜀superscript𝑡𝜀t\in(t^{*}-\varepsilon,t^{*}+\varepsilon)italic_t ∈ ( italic_t start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_ε , italic_t start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + italic_ε ). In particular, the ranks of Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and Ztsubscript𝑍𝑡Z_{t}italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are constant for all t(t*ε,t*+ε)𝑡superscript𝑡𝜀superscript𝑡𝜀t\in(t^{*}-\varepsilon,t^{*}+\varepsilon)italic_t ∈ ( italic_t start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_ε , italic_t start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + italic_ε ).

The last statement of the theorem directly follows from the fact that a change in the rank of either Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT or Ztsubscript𝑍𝑡Z_{t}italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT implies a loss of strict complementarity because of the lower-semicontinuity of the rank. Based on these facts, for the initial problem (SDPt𝑡titalic_t) we make the following assumptions.

  1. (A1)

    (SDPt𝑡titalic_t) and (D-SDPt𝑡titalic_t) are strictly feasible for any t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ].

  2. (A2)

    The linear operator 𝒜tsubscript𝒜𝑡\mathcal{A}_{t}caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is surjective in any t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ].

  3. (A3)

    (SDPt𝑡titalic_t) has a primal nondegenerate solution Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and
     (D-SDPt𝑡titalic_t) has a dual nondegenerate solution Ztsubscript𝑍𝑡Z_{t}italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at any t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ].

  4. (A4)

    The solution pair (Xt,Zt)subscript𝑋𝑡subscript𝑍𝑡(X_{t},Z_{t})( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is strictly complementary for any t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ].

  5. (A5)

    Data 𝒜t,bt,Ctsubscript𝒜𝑡subscript𝑏𝑡subscript𝐶𝑡\mathcal{A}_{t},b_{t},C_{t}caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are continuously differentiable functions of t𝑡titalic_t.

Assumptions (A1) and (A2) are standard for SDPs and in linearly constrained optimization in general, while assumptions (A3)(A4) rule out many “pathological” cases [12]. In particular, assumption (A3) implies that the solution pair (Xt,Zt)subscript𝑋𝑡subscript𝑍𝑡(X_{t},Z_{t})( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is unique. By Theorem 2.4, assumptions (A3)(A4), and (A5) have the following consequences:

  1. (C1)

    (SDPt𝑡titalic_t) has a unique and smooth solution curve Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ].

  2. (C2)

    The curve tXtmaps-to𝑡subscript𝑋𝑡t\mapsto X_{t}italic_t ↦ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is of constant rank r*superscript𝑟r^{*}italic_r start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT.


For setting up the factorized version (BMt𝑡titalic_t) of (SDPt𝑡titalic_t), it is necessary to choose the dimension r𝑟ritalic_r of the factor matrix Y𝑌Yitalic_Y in (BMt𝑡titalic_t), ideally equal to r*superscript𝑟r^{*}italic_r start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT of (C2). In what follows, we assume that we know the constant rank r*superscript𝑟r^{*}italic_r start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT. Given access to an initial solution X0subscript𝑋0X_{0}italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT at time t=0𝑡0t=0italic_t = 0, it is possible to compute r*superscript𝑟r^{*}italic_r start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, so this assumption is without further loss of generality.

It is worth noting that the rank cannot be arbitrary. Based on a known result of Barvinok and Pataki [10], for any SDP defined by m𝑚mitalic_m linearly independent constraints, there always exists a solution of rank r𝑟ritalic_r such that 12r(r+1)m12𝑟𝑟1𝑚\frac{1}{2}r(r+1)\leq mdivide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_r ( italic_r + 1 ) ≤ italic_m. Since we assume that Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the unique solution to (SDPt𝑡titalic_t) with constant rank r*superscript𝑟r^{*}italic_r start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT we conclude that

12r*(r*+1)m.12superscript𝑟superscript𝑟1𝑚\frac{1}{2}r^{*}(r^{*}+1)\leq m.divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_r start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_r start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + 1 ) ≤ italic_m .

We point out that recently the Barvinok–Pataki bound has been slightly improved [33].

3 Quotient geometry of positive semidefinite rank-𝒓𝒓\bm{r}bold_italic_r matrices

We now investigate the factorized formulation (BMt𝑡titalic_t) in more detail. As already mentioned, in contrast to the original problem (SDPt𝑡titalic_t), this is a nonlinear problem (specifically, a quadratically constrained quadratic problem) which is nonconvex. Moreover, the property of uniqueness of a solution, which is guaranteed by (C1) for the original problem (SDPt𝑡titalic_t), is lost in (BMt𝑡titalic_t), because its representation via the map

ϕ:n×r𝕊n,ϕ(Y)=YYT:italic-ϕformulae-sequencesuperscript𝑛𝑟superscript𝕊𝑛italic-ϕ𝑌𝑌superscript𝑌𝑇\phi:\mathbb{R}^{n\times r}\to\mathbb{S}^{n},\quad\phi(Y)=YY^{T}italic_ϕ : blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT → blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_ϕ ( italic_Y ) = italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT

is not unique. In fact, this map is invariant under the orthogonal group action

𝒪r×n×rn×r,(Q,Y)YQ,formulae-sequencesubscript𝒪𝑟superscript𝑛𝑟superscript𝑛𝑟maps-to𝑄𝑌𝑌𝑄\mathcal{O}_{r}\times\mathbb{R}^{n\times r}\to\mathbb{R}^{n\times r},\quad(Q,Y% )\mapsto YQ,caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT , ( italic_Q , italic_Y ) ↦ italic_Y italic_Q ,

on n×rsuperscript𝑛𝑟\mathbb{R}^{n\times r}blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT, where

𝒪r:={Qr×r:QQT=Ir},assignsubscript𝒪𝑟conditional-set𝑄superscript𝑟𝑟𝑄superscript𝑄𝑇subscript𝐼𝑟\mathcal{O}_{r}:=\{Q\in\mathbb{R}^{r\times r}\ :\ QQ^{T}=I_{r}\},caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT := { italic_Q ∈ blackboard_R start_POSTSUPERSCRIPT italic_r × italic_r end_POSTSUPERSCRIPT : italic_Q italic_Q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT } ,

with Irsubscript𝐼𝑟I_{r}italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT denoting the r×r𝑟𝑟r\times ritalic_r × italic_r identity matrix, is the orthogonal group. Hence both the objective function YCt,YYTmaps-to𝑌subscript𝐶𝑡𝑌superscript𝑌𝑇Y\mapsto\langle C_{t},YY^{T}\rangleitalic_Y ↦ ⟨ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⟩ and the constraints 𝒜t(YYT)=btsubscript𝒜𝑡𝑌superscript𝑌𝑇subscript𝑏𝑡\mathcal{A}_{t}(YY^{T})=b_{t}caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) = italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in (BMt𝑡titalic_t) are invariant under the same action. As a consequence, the solutions of (BMt𝑡titalic_t) are never isolated [36]. This poses a technical obstacle to the use of path-following algorithms, as the path needs to be, at least locally, uniquely defined.

On the other hand, by assuming that the correct rank r=r*𝑟superscript𝑟r=r^{*}italic_r = italic_r start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT of a unique solution Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for (SDPt𝑡titalic_t) has been chosen for the factorization, any solution Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for (BMt𝑡titalic_t) must satisfy YtYtT=Xtsubscript𝑌𝑡superscriptsubscript𝑌𝑡𝑇subscript𝑋𝑡Y_{t}Y_{t}^{T}=X_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. From this it follows that any solution is of the form YtQsubscript𝑌𝑡𝑄Y_{t}Qitalic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Q with Q𝒪r𝑄subscript𝒪𝑟Q\in\mathcal{O}_{r}italic_Q ∈ caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT; see, e.g., [17, Lemma 2.1]. In other words, the action of the orthogonal group is indeed the only source of nonuniqueness. This corresponds to the well-known fact that the set of positive definite fixed rank-r𝑟ritalic_r symmetric matrices, which we denote by r+superscriptsubscript𝑟\mathcal{M}_{r}^{+}caligraphic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, is a smooth manifold that can be identified with the quotient manifold *n×r/𝒪rsuperscriptsubscript𝑛𝑟subscript𝒪𝑟\mathbb{R}_{*}^{n\times r}/\mathcal{O}_{r}blackboard_R start_POSTSUBSCRIPT * end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT / caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, where *n×rsuperscriptsubscript𝑛𝑟\mathbb{R}_{*}^{n\times r}blackboard_R start_POSTSUBSCRIPT * end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT is the open set of n×r𝑛𝑟n\times ritalic_n × italic_r matrices with full column rank.

In the following, we describe how the nonuniqueness can be removed by introducing a so-called horizontal space, which is a standard concept in optimization on quotient manifolds, see, e.g., [2, section 3.5.8]. For positive semidefinite fixed-rank matrices, this has been worked out in detail in [39]. Additional material, including the complex Hermitian case, can be found in [8]. However, in order to arrive at practical formulas that are useful for our path-following algorithm later on, we will not further refer to the concept of a quotient manifold but directly focus on the injectivity of the map ϕitalic-ϕ\phiitalic_ϕ on suitable linear subspaces of n×rsuperscript𝑛𝑟\mathbb{R}^{n\times r}blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT, which we describe in the following section. Such a simplification takes into account that we are dealing with a quotient manifold *n×r/𝒪rsuperscriptsubscript𝑛𝑟subscript𝒪𝑟\mathbb{R}_{*}^{n\times r}/\mathcal{O}_{r}blackboard_R start_POSTSUBSCRIPT * end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT / caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT with *n×rsuperscriptsubscript𝑛𝑟\mathbb{R}_{*}^{n\times r}blackboard_R start_POSTSUBSCRIPT * end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT being just an open subset of n×rsuperscript𝑛𝑟\mathbb{R}^{n\times r}blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT. Then the horizontal space at a point Y𝑌Yitalic_Y should be a subspace of the tangent space of *n×rsuperscriptsubscript𝑛𝑟\mathbb{R}_{*}^{n\times r}blackboard_R start_POSTSUBSCRIPT * end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT at Y𝑌Yitalic_Y, which, however, is just n×rsuperscript𝑛𝑟\mathbb{R}^{n\times r}blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT.

3.1 Horizontal space and unique factorizations

Given Y*n×r𝑌subscriptsuperscript𝑛𝑟Y\in\mathbb{R}^{n\times r}_{*}italic_Y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT * end_POSTSUBSCRIPT, we denote the corresponding orbit under the orthogonal group as

Y𝒪r:={YQ:Q𝒪r}*n×r.assign𝑌subscript𝒪𝑟conditional-set𝑌𝑄𝑄subscript𝒪𝑟subscriptsuperscript𝑛𝑟Y\mathcal{O}_{r}:=\{YQ\,:\,Q\in\mathcal{O}_{r}\}\subseteq\mathbb{R}^{n\times r% }_{*}.italic_Y caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT := { italic_Y italic_Q : italic_Q ∈ caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT } ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT * end_POSTSUBSCRIPT .

The orbit Y𝒪r𝑌subscript𝒪𝑟Y\mathcal{O}_{r}italic_Y caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is an embedded submanifold of *n×rsubscriptsuperscript𝑛𝑟\mathbb{R}^{n\times r}_{*}blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT * end_POSTSUBSCRIPT of dimension 12r(r1)12𝑟𝑟1\tfrac{1}{2}r(r-1)divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_r ( italic_r - 1 ) with two connected components, according to detQ=±1det𝑄plus-or-minus1\operatorname{det}Q=\pm 1roman_det italic_Q = ± 1. Its tangent space at Y𝑌Yitalic_Y, which we denote by 𝒯Ysubscript𝒯𝑌\mathcal{T}_{Y}caligraphic_T start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT, is easily derived by noting that the tangent space to the orthogonal group 𝒪rsubscript𝒪𝑟\mathcal{O}_{r}caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT at the identity matrix equals the space of real skew-symmetric matrices 𝕊skewrsubscriptsuperscript𝕊𝑟𝑠𝑘𝑒𝑤\mathbb{S}^{r}_{skew}blackboard_S start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s italic_k italic_e italic_w end_POSTSUBSCRIPT (see, e.g., [2, Example 3.5.3]). Therefore,

𝒯Y={YS:S𝕊skewr}.subscript𝒯𝑌conditional-set𝑌𝑆𝑆subscriptsuperscript𝕊𝑟𝑠𝑘𝑒𝑤\mathcal{T}_{Y}=\{YS\,:\,S\in\mathbb{S}^{r}_{skew}\}.caligraphic_T start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT = { italic_Y italic_S : italic_S ∈ blackboard_S start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s italic_k italic_e italic_w end_POSTSUBSCRIPT } .

Since the map ϕ(Y)=YYTitalic-ϕ𝑌𝑌superscript𝑌𝑇\phi(Y)=YY^{T}italic_ϕ ( italic_Y ) = italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is constant on Y𝒪r𝑌subscript𝒪𝑟Y\mathcal{O}_{r}italic_Y caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, its derivative

Yϕ(Y)[H]=YHT+HYTmaps-to𝑌superscriptitalic-ϕ𝑌delimited-[]𝐻𝑌superscript𝐻𝑇𝐻superscript𝑌𝑇Y\mapsto\phi^{\prime}(Y)[H]=YH^{T}+HY^{T}italic_Y ↦ italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Y ) [ italic_H ] = italic_Y italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_H italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT

vanishes on 𝒯Ysubscript𝒯𝑌\mathcal{T}_{Y}caligraphic_T start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT, that is 𝒯Ykerϕ(Y)subscript𝒯𝑌kersuperscriptitalic-ϕ𝑌\mathcal{T}_{Y}\subseteq\operatorname{ker}\phi^{\prime}(Y)caligraphic_T start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ⊆ roman_ker italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Y ).

The horizontal space at Y𝑌Yitalic_Y, denoted by Ysubscript𝑌\mathcal{H}_{Y}caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT, is the orthogonal complement of 𝒯Ysubscript𝒯𝑌\mathcal{T}_{Y}caligraphic_T start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT with respect to the Frobenius inner product. One verifies that

Y:=𝒯Y={Hn×r:YTH=HTY},assignsubscript𝑌superscriptsubscript𝒯𝑌perpendicular-toconditional-set𝐻superscript𝑛𝑟superscript𝑌𝑇𝐻superscript𝐻𝑇𝑌\mathcal{H}_{Y}:=\mathcal{T}_{Y}^{\perp}=\{H\in\mathbb{R}^{n\times r}\,:\,Y^{T% }H=H^{T}Y\},caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT := caligraphic_T start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT = { italic_H ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT : italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H = italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y } ,

since 0=H,YS=YTH,S0𝐻𝑌𝑆superscript𝑌𝑇𝐻𝑆0=\langle H,YS\rangle=\langle Y^{T}H,S\rangle0 = ⟨ italic_H , italic_Y italic_S ⟩ = ⟨ italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H , italic_S ⟩ holds for all skew-symmetric S𝑆Sitalic_S if and only if YTHsuperscript𝑌𝑇𝐻Y^{T}Hitalic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H is symmetric. We point out that sometimes any subspace complementary to 𝒯Ysubscript𝒯𝑌\mathcal{T}_{Y}caligraphic_T start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT is called a horizontal space, but we will stick to the above choice, as it is the most common and has certain theoretical and practical advantages. In particular, since YY𝑌subscript𝑌Y\in\mathcal{H}_{Y}italic_Y ∈ caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT, the affine space Y+Y𝑌subscript𝑌Y+\mathcal{H}_{Y}italic_Y + caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT equals Ysubscript𝑌\mathcal{H}_{Y}caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT, so it is just a linear space.

The purpose of the horizontal space is to provide a unique way of representing a neighborhood of X=YYT𝑋𝑌superscript𝑌𝑇X=YY^{T}italic_X = italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT in r+superscriptsubscript𝑟\mathcal{M}_{r}^{+}caligraphic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT through ϕ(Y+H)=(Y+H)(Y+H)Titalic-ϕ𝑌𝐻𝑌𝐻superscript𝑌𝐻𝑇\phi(Y+H)=(Y+H)(Y+H)^{T}italic_ϕ ( italic_Y + italic_H ) = ( italic_Y + italic_H ) ( italic_Y + italic_H ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT with HY𝐻subscript𝑌H\in\mathcal{H}_{Y}italic_H ∈ caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT. Clearly,

dimY=nrdim𝒪r=nr12r(r1)=dimr+.dimensionsubscript𝑌𝑛𝑟dimensionsubscript𝒪𝑟𝑛𝑟12𝑟𝑟1dimensionsuperscriptsubscript𝑟\dim\mathcal{H}_{Y}=nr-\dim\mathcal{O}_{r}=nr-\frac{1}{2}r(r-1)=\dim\mathcal{M% }_{r}^{+}.roman_dim caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT = italic_n italic_r - roman_dim caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = italic_n italic_r - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_r ( italic_r - 1 ) = roman_dim caligraphic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT .

Moreover, the following holds.

Proposition 3.1.

The restriction of ϕ(Y)superscriptitalic-ϕnormal-′𝑌\phi^{\prime}(Y)italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Y ) to Ysubscript𝑌\mathcal{H}_{Y}caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT is injective. In particular, it holds that

YHT+HYTF2σr(Y)HFfor all HY,subscriptnorm𝑌superscript𝐻𝑇𝐻superscript𝑌𝑇𝐹2subscript𝜎𝑟𝑌subscriptnorm𝐻𝐹for all HY,\|YH^{T}+HY^{T}\|_{F}\geq\sqrt{2}\sigma_{r}(Y)\|H\|_{F}\quad\text{for all $H% \in\mathcal{H}_{Y}$,}∥ italic_Y italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_H italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≥ square-root start_ARG 2 end_ARG italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) ∥ italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT for all italic_H ∈ caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ,

where σr(Y)>0subscript𝜎𝑟𝑌0\sigma_{r}(Y)>0italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) > 0 is the smallest singular value of Y𝑌Yitalic_Y. This lower bound is sharp if r<n𝑟𝑛r<nitalic_r < italic_n. For r=n𝑟𝑛r=nitalic_r = italic_n one has the sharp estimate

YHT+HYTF2σr(Y)HFfor all HY.subscriptnorm𝑌superscript𝐻𝑇𝐻superscript𝑌𝑇𝐹2subscript𝜎𝑟𝑌subscriptnorm𝐻𝐹for all HY.\|YH^{T}+HY^{T}\|_{F}\geq 2\sigma_{r}(Y)\|H\|_{F}\quad\text{for all $H\in% \mathcal{H}_{Y}$.}∥ italic_Y italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_H italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≥ 2 italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) ∥ italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT for all italic_H ∈ caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT .

As a consequence, in either case, kerϕ(Y)=𝒯Ynormal-kersuperscriptitalic-ϕnormal-′𝑌subscript𝒯𝑌\operatorname{ker}\phi^{\prime}(Y)=\mathcal{T}_{Y}roman_ker italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Y ) = caligraphic_T start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT.

Proof.

For Z𝕊n𝑍superscript𝕊𝑛Z\in\mathbb{S}^{n}italic_Z ∈ blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT we have trace((YHT+HYT)Z)=2trace(ZYHT)trace𝑌superscript𝐻𝑇𝐻superscript𝑌𝑇𝑍2trace𝑍𝑌superscript𝐻𝑇\operatorname{trace}((YH^{T}+HY^{T})Z)=2\operatorname{trace}(ZYH^{T})roman_trace ( ( italic_Y italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_H italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) italic_Z ) = 2 roman_trace ( italic_Z italic_Y italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) by standard properties of the trace. Taking Z=YHT+HYT𝑍𝑌superscript𝐻𝑇𝐻superscript𝑌𝑇Z=YH^{T}+HY^{T}italic_Z = italic_Y italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_H italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT yields

YHT+HYTF2superscriptsubscriptnorm𝑌superscript𝐻𝑇𝐻superscript𝑌𝑇𝐹2\displaystyle\|YH^{T}+HY^{T}\|_{F}^{2}∥ italic_Y italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_H italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =2trace(YHTYHT+HYTYHT)=2YTHF2+2YHTF2.absent2trace𝑌superscript𝐻𝑇𝑌superscript𝐻𝑇𝐻superscript𝑌𝑇𝑌superscript𝐻𝑇2superscriptsubscriptnormsuperscript𝑌𝑇𝐻𝐹22superscriptsubscriptnorm𝑌superscript𝐻𝑇𝐹2\displaystyle=2\operatorname{trace}(YH^{T}YH^{T}+HY^{T}YH^{T})=2\|Y^{T}H\|_{F}% ^{2}+2\|YH^{T}\|_{F}^{2}.= 2 roman_trace ( italic_Y italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_H italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) = 2 ∥ italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ∥ italic_Y italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

To derive the second equality we used YTH=HTYsuperscript𝑌𝑇𝐻superscript𝐻𝑇𝑌Y^{T}H=H^{T}Yitalic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H = italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y for HY𝐻subscript𝑌H\in\mathcal{H}_{Y}italic_H ∈ caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT. Clearly, YHTF2σr(Y)2HF2superscriptsubscriptnorm𝑌superscript𝐻𝑇𝐹2subscript𝜎𝑟superscript𝑌2superscriptsubscriptnorm𝐻𝐹2\|YH^{T}\|_{F}^{2}\geq\sigma_{r}(Y)^{2}\|H\|_{F}^{2}∥ italic_Y italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and if r=n𝑟𝑛r=nitalic_r = italic_n we also have that YTHF2σr(Y)2HF2superscriptsubscriptnormsuperscript𝑌𝑇𝐻𝐹2subscript𝜎𝑟superscript𝑌2superscriptsubscriptnorm𝐻𝐹2\|Y^{T}H\|_{F}^{2}\geq\sigma_{r}(Y)^{2}\|H\|_{F}^{2}∥ italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. This proves the asserted lower bounds. To show that they are sharp, let (ur,vr)subscript𝑢𝑟subscript𝑣𝑟(u_{r},v_{r})( italic_u start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) be a (normalized) singular vector tuple such that Yvr=σr(Y)ur𝑌subscript𝑣𝑟subscript𝜎𝑟𝑌subscript𝑢𝑟Yv_{r}=\sigma_{r}(Y)u_{r}italic_Y italic_v start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) italic_u start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. If r<n𝑟𝑛r<nitalic_r < italic_n, then for any u𝑢uitalic_u such that uTY=0superscript𝑢𝑇𝑌0u^{T}Y=0italic_u start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y = 0 one verifies that the matrix H=uvrT𝐻𝑢superscriptsubscript𝑣𝑟𝑇H=uv_{r}^{T}italic_H = italic_u italic_v start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is in Ysubscript𝑌\mathcal{H}_{Y}caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT and achieves equality. When r=n𝑟𝑛r=nitalic_r = italic_n, H=urvrT𝐻subscript𝑢𝑟superscriptsubscript𝑣𝑟𝑇H=u_{r}v_{r}^{T}italic_H = italic_u start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT achieves it. ∎

Since ϕitalic-ϕ\phiitalic_ϕ maps *n×rsuperscriptsubscript𝑛𝑟\mathbb{R}_{*}^{n\times r}blackboard_R start_POSTSUBSCRIPT * end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT to r+superscriptsubscript𝑟\mathcal{M}_{r}^{+}caligraphic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, which is of the same dimension as Ysubscript𝑌\mathcal{H}_{Y}caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT, the above proposition implies that ϕ(Y)superscriptitalic-ϕ𝑌\phi^{\prime}(Y)italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Y ) is a bijection between Ysubscript𝑌\mathcal{H}_{Y}caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT and 𝒯ϕ(Y)r+subscript𝒯italic-ϕ𝑌superscriptsubscript𝑟\mathcal{T}_{\phi(Y)}\mathcal{M}_{r}^{+}caligraphic_T start_POSTSUBSCRIPT italic_ϕ ( italic_Y ) end_POSTSUBSCRIPT caligraphic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT. This already shows that the restriction of ϕitalic-ϕ\phiitalic_ϕ to the linear space Y+Y=Y𝑌subscript𝑌subscript𝑌Y+\mathcal{H}_{Y}=\mathcal{H}_{Y}italic_Y + caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT = caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT is a local diffeomorphism between a neighborhood of Y𝑌Yitalic_Y in Ysubscript𝑌\mathcal{H}_{Y}caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT and a neighborhood of ϕ(Y)italic-ϕ𝑌\phi(Y)italic_ϕ ( italic_Y ) in r+superscriptsubscript𝑟\mathcal{M}_{r}^{+}caligraphic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT. The subsequent more quantitative statement matches Theorem 6.3 in [39] on the injectivity radius of the quotient manifold *n×r/𝒪rsuperscriptsubscript𝑛𝑟subscript𝒪𝑟\mathbb{R}_{*}^{n\times r}/\mathcal{O}_{r}blackboard_R start_POSTSUBSCRIPT * end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT / caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. For convenience we will provide a self-contained proof that is more algebraic and does not require the concept of quotient manifolds.

Proposition 3.2.

Let Y:={HY:HF<σr(Y)}assignsubscript𝑌conditional-set𝐻subscript𝑌subscriptnorm𝐻𝐹subscript𝜎𝑟𝑌\mathcal{B}_{Y}:=\{H\in\mathcal{H}_{Y}\colon\|H\|_{F}<\sigma_{r}(Y)\}caligraphic_B start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT := { italic_H ∈ caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT : ∥ italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT < italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) }. Then the restriction of ϕitalic-ϕ\phiitalic_ϕ to Y+Y𝑌subscript𝑌Y+\mathcal{B}_{Y}italic_Y + caligraphic_B start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT is injective and maps diffeomorphically to a (relatively) open neighborhood of Y𝑌Yitalic_Y in r+superscriptsubscript𝑟\mathcal{M}_{r}^{+}caligraphic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT.

It is interesting to note that Ysubscript𝑌\mathcal{B}_{Y}caligraphic_B start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT is the largest possible ball in Ysubscript𝑌\mathcal{H}_{Y}caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT on which the result can hold, since the rank-one matrices σiuiviTsubscript𝜎𝑖subscript𝑢𝑖superscriptsubscript𝑣𝑖𝑇\sigma_{i}u_{i}v_{i}^{T}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT comprised of singular pairs of Y𝑌Yitalic_Y all belong to Ysubscript𝑌\mathcal{H}_{Y}caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT and YσrurvrT𝑌subscript𝜎𝑟subscript𝑢𝑟superscriptsubscript𝑣𝑟𝑇Y-\sigma_{r}u_{r}v_{r}^{T}italic_Y - italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is rank-deficient. Another important observation is that σr(Y)subscript𝜎𝑟𝑌\sigma_{r}(Y)italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) does not depend on the particular choice of Y𝑌Yitalic_Y within the orbit Y𝒪r𝑌subscript𝒪𝑟Y\mathcal{O}_{r}italic_Y caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT.

Proof.

Consider H1,H2Ysubscript𝐻1subscript𝐻2subscript𝑌H_{1},H_{2}\in\mathcal{B}_{Y}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ caligraphic_B start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT. Let Y=UΣVT𝑌𝑈Σsuperscript𝑉𝑇Y=U\Sigma V^{T}italic_Y = italic_U roman_Σ italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT be a singular value decomposition of Y𝑌Yitalic_Y with Un×r𝑈superscript𝑛𝑟U\in\mathbb{R}^{n\times r}italic_U ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT and Vr×r𝑉superscript𝑟𝑟V\in\mathbb{R}^{r\times r}italic_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_r × italic_r end_POSTSUPERSCRIPT having orthonormal columns. We assume r<n𝑟𝑛r<nitalic_r < italic_n. Then by Un×(nr)subscript𝑈perpendicular-tosuperscript𝑛𝑛𝑟U_{\perp}\in\mathbb{R}^{n\times(n-r)}italic_U start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × ( italic_n - italic_r ) end_POSTSUPERSCRIPT we denote a matrix with orthonormal columns and UTU=0superscript𝑈𝑇subscript𝑈perpendicular-to0U^{T}U_{\perp}=0italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT = 0. In the case r=n𝑟𝑛r=nitalic_r = italic_n, the terms involving Usubscript𝑈perpendicular-toU_{\perp}italic_U start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT in the following calculation are simply not present. We write

H1=UA1VT+UB1VT,H2=UA2VT+UB2VT.formulae-sequencesubscript𝐻1𝑈subscript𝐴1superscript𝑉𝑇subscript𝑈perpendicular-tosubscript𝐵1superscript𝑉𝑇subscript𝐻2𝑈subscript𝐴2superscript𝑉𝑇subscript𝑈perpendicular-tosubscript𝐵2superscript𝑉𝑇H_{1}=UA_{1}V^{T}+U_{\perp}B_{1}V^{T},\quad H_{2}=UA_{2}V^{T}+U_{\perp}B_{2}V^% {T}.italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_U italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_U start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_U italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_U start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT .

Since H1,H2Ysubscript𝐻1subscript𝐻2subscript𝑌H_{1},H_{2}\in\mathcal{H}_{Y}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT, we have

ΣA1=A1TΣ,ΣA2=A2TΣ.formulae-sequenceΣsubscript𝐴1superscriptsubscript𝐴1𝑇ΣΣsubscript𝐴2superscriptsubscript𝐴2𝑇Σ\Sigma A_{1}=A_{1}^{T}\Sigma,\quad\Sigma A_{2}=A_{2}^{T}\Sigma.roman_Σ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Σ , roman_Σ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Σ .

Then a direct calculation yields

(Y+H1)(Y+H1)TYYT𝑌subscript𝐻1superscript𝑌subscript𝐻1𝑇𝑌superscript𝑌𝑇\displaystyle(Y+H_{1})(Y+H_{1})^{T}-YY^{T}( italic_Y + italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_Y + italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT =U[ΣA1T+A1Σ+A1A1T]UTabsent𝑈delimited-[]Σsuperscriptsubscript𝐴1𝑇subscript𝐴1Σsubscript𝐴1superscriptsubscript𝐴1𝑇superscript𝑈𝑇\displaystyle=U[\Sigma A_{1}^{T}+A_{1}\Sigma+A_{1}A_{1}^{T}]U^{T}= italic_U [ roman_Σ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Σ + italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ] italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT
+U[Σ+A1]B1TUT+UB1[Σ+A1T]UT+UB1B1TUT,𝑈delimited-[]Σsubscript𝐴1superscriptsubscript𝐵1𝑇superscriptsubscript𝑈perpendicular-to𝑇subscript𝑈perpendicular-tosubscript𝐵1delimited-[]Σsuperscriptsubscript𝐴1𝑇superscript𝑈𝑇subscript𝑈perpendicular-tosubscript𝐵1superscriptsubscript𝐵1𝑇superscriptsubscript𝑈perpendicular-to𝑇\displaystyle+U[\Sigma+A_{1}]B_{1}^{T}U_{\perp}^{T}+U_{\perp}B_{1}[\Sigma+A_{1% }^{T}]U^{T}+U_{\perp}B_{1}B_{1}^{T}U_{\perp}^{T},+ italic_U [ roman_Σ + italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_U start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ roman_Σ + italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ] italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_U start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_U start_POSTSUBSCRIPT ⟂ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ,

and analogously for (Y+H2)(Y+H2)TYYT𝑌subscript𝐻2superscript𝑌subscript𝐻2𝑇𝑌superscript𝑌𝑇(Y+H_{2})(Y+H_{2})^{T}-YY^{T}( italic_Y + italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ( italic_Y + italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. Since the four terms in the above sum are mutually orthogonal in the Frobenius inner product, the equality (Y+H1)(Y+H1)T=(Y+H2)(Y+H2)T𝑌subscript𝐻1superscript𝑌subscript𝐻1𝑇𝑌subscript𝐻2superscript𝑌subscript𝐻2𝑇(Y+H_{1})(Y+H_{1})^{T}=(Y+H_{2})(Y+H_{2})^{T}( italic_Y + italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_Y + italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = ( italic_Y + italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ( italic_Y + italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT particularly implies

ΣA1T+A1Σ+A1A1T=ΣA2T+A2Σ+A2A2T,Σsuperscriptsubscript𝐴1𝑇subscript𝐴1Σsubscript𝐴1superscriptsubscript𝐴1𝑇Σsuperscriptsubscript𝐴2𝑇subscript𝐴2Σsubscript𝐴2superscriptsubscript𝐴2𝑇\Sigma A_{1}^{T}+A_{1}\Sigma+A_{1}A_{1}^{T}=\Sigma A_{2}^{T}+A_{2}\Sigma+A_{2}% A_{2}^{T},roman_Σ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Σ + italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = roman_Σ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Σ + italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ,

as well as

(Σ+A1)B1T=(Σ+A2)B2T.Σsubscript𝐴1superscriptsubscript𝐵1𝑇Σsubscript𝐴2superscriptsubscript𝐵2𝑇(\Sigma+A_{1})B_{1}^{T}=(\Sigma+A_{2})B_{2}^{T}.( roman_Σ + italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = ( roman_Σ + italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT . (3.1)

The first of these equations can be written as

Σ(A1A2)T+(A1A2)Σ=A2(A2A1)T(A1A2)A1T.Σsuperscriptsubscript𝐴1subscript𝐴2𝑇subscript𝐴1subscript𝐴2Σsubscript𝐴2superscriptsubscript𝐴2subscript𝐴1𝑇subscript𝐴1subscript𝐴2superscriptsubscript𝐴1𝑇\Sigma(A_{1}-A_{2})^{T}+(A_{1}-A_{2})\Sigma=A_{2}(A_{2}-A_{1})^{T}-(A_{1}-A_{2% })A_{1}^{T}.roman_Σ ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) roman_Σ = italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT .

By Proposition 3.1 (with n=r𝑛𝑟n=ritalic_n = italic_r, Y=Σ𝑌ΣY=\Sigmaitalic_Y = roman_Σ and H=A1A2𝐻subscript𝐴1subscript𝐴2H=A_{1}-A_{2}italic_H = italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT),

Σ(A1A2)T+(A1A2)ΣF2σr(Y)A1A2F,subscriptnormΣsuperscriptsubscript𝐴1subscript𝐴2𝑇subscript𝐴1subscript𝐴2Σ𝐹2subscript𝜎𝑟𝑌subscriptnormsubscript𝐴1subscript𝐴2𝐹\|\Sigma(A_{1}-A_{2})^{T}+(A_{1}-A_{2})\Sigma\|_{F}\geq 2\sigma_{r}(Y)\|A_{1}-% A_{2}\|_{F},∥ roman_Σ ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) roman_Σ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≥ 2 italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) ∥ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ,

whereas

A2(A2A1)T(A1A2)A1TF(H2F+H1F)A1A2F.subscriptnormsubscript𝐴2superscriptsubscript𝐴2subscript𝐴1𝑇subscript𝐴1subscript𝐴2superscriptsubscript𝐴1𝑇𝐹subscriptnormsubscript𝐻2𝐹subscriptnormsubscript𝐻1𝐹subscriptnormsubscript𝐴1subscript𝐴2𝐹\|A_{2}(A_{2}-A_{1})^{T}-(A_{1}-A_{2})A_{1}^{T}\|_{F}\leq(\|H_{2}\|_{F}+\|H_{1% }\|_{F})\|A_{1}-A_{2}\|_{F}.∥ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ ( ∥ italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT + ∥ italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) ∥ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT .

Since H2F+H1F<2σr(Y)subscriptnormsubscript𝐻2𝐹subscriptnormsubscript𝐻1𝐹2subscript𝜎𝑟𝑌\|H_{2}\|_{F}+\|H_{1}\|_{F}<2\sigma_{r}(Y)∥ italic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT + ∥ italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT < 2 italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ), this shows that we must have A1=A2subscript𝐴1subscript𝐴2A_{1}=A_{2}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, which then by (3.1) also implies B1=B2subscript𝐵1subscript𝐵2B_{1}=B_{2}italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, since Σ+A1Σsubscript𝐴1\Sigma+A_{1}roman_Σ + italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is invertible.

Hence, we have proven that ϕitalic-ϕ\phiitalic_ϕ is an injective map from Y+Y𝑌subscript𝑌Y+\mathcal{B}_{Y}italic_Y + caligraphic_B start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT to r+superscriptsubscript𝑟\mathcal{M}_{r}^{+}caligraphic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT. To validate that it is a diffeomorphism onto its image we show that it is locally a diffeomorphism, for which again it suffices to confirm that ϕ(Y+H)superscriptitalic-ϕ𝑌𝐻\phi^{\prime}(Y+H)italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Y + italic_H ) is injective on Ysubscript𝑌\mathcal{H}_{Y}caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT for every HY𝐻subscript𝑌H\in\mathcal{B}_{Y}italic_H ∈ caligraphic_B start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT (since Ysubscript𝑌\mathcal{H}_{Y}caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT and r+superscriptsubscript𝑟\mathcal{M}_{r}^{+}caligraphic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT have the same dimension). It follows from Proposition 3.1 (with Y𝑌Yitalic_Y replaced by Y+H𝑌𝐻Y+Hitalic_Y + italic_H, which has full column rank) that the null space of ϕ(Y+H)superscriptitalic-ϕ𝑌𝐻\phi^{\prime}(Y+H)italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Y + italic_H ) equals 𝒯Y+Hsubscript𝒯𝑌𝐻\mathcal{T}_{Y+H}caligraphic_T start_POSTSUBSCRIPT italic_Y + italic_H end_POSTSUBSCRIPT. We claim that 𝒯Y+HY={0}subscript𝒯𝑌𝐻subscript𝑌0\mathcal{T}_{Y+H}\cap\mathcal{H}_{Y}=\{0\}caligraphic_T start_POSTSUBSCRIPT italic_Y + italic_H end_POSTSUBSCRIPT ∩ caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT = { 0 }, which proves the injectivity of ϕ(Y+H)superscriptitalic-ϕ𝑌𝐻\phi^{\prime}(Y+H)italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Y + italic_H ) on Ysubscript𝑌\mathcal{H}_{Y}caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT. Indeed, let K𝐾Kitalic_K be an element in the intersection, i.e., K=(Y+H)S𝐾𝑌𝐻𝑆K=(Y+H)Sitalic_K = ( italic_Y + italic_H ) italic_S for some skew-symmetric S𝑆Sitalic_S and YTKKTY=0superscript𝑌𝑇𝐾superscript𝐾𝑇𝑌0Y^{T}K-K^{T}Y=0italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_K - italic_K start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y = 0. Inserting the first relation into the second, and using YTH=HTYsuperscript𝑌𝑇𝐻superscript𝐻𝑇𝑌Y^{T}H=H^{T}Yitalic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H = italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y, yields the homogenuous Lyapunov equation

(YTY+YTH)S+S(YTY+YTH)=0.superscript𝑌𝑇𝑌superscript𝑌𝑇𝐻𝑆𝑆superscript𝑌𝑇𝑌superscript𝑌𝑇𝐻0(Y^{T}Y+Y^{T}H)S+S(Y^{T}Y+Y^{T}H)=0.( italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y + italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H ) italic_S + italic_S ( italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y + italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H ) = 0 . (3.2)

The symmetric matrix

YTY+YTH=12(Y+H)T(Y+H)+12(YTYHTH)superscript𝑌𝑇𝑌superscript𝑌𝑇𝐻12superscript𝑌𝐻𝑇𝑌𝐻12superscript𝑌𝑇𝑌superscript𝐻𝑇𝐻Y^{T}Y+Y^{T}H=\frac{1}{2}(Y+H)^{T}(Y+H)+\frac{1}{2}(Y^{T}Y-H^{T}H)italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y + italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_Y + italic_H ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_Y + italic_H ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y - italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H )

in (3.2) is positive definite, since λ1(HTH)HTHF<σr(Y)2=λr(YTY)subscript𝜆1superscript𝐻𝑇𝐻subscriptnormsuperscript𝐻𝑇𝐻𝐹subscript𝜎𝑟superscript𝑌2subscript𝜆𝑟superscript𝑌𝑇𝑌\lambda_{1}(H^{T}H)\leq\|H^{T}H\|_{F}<\sigma_{r}(Y)^{2}=\lambda_{r}(Y^{T}Y)italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H ) ≤ ∥ italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT < italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y ) (here λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the i𝑖iitalic_i-th eigenvalue of the corresponding matrix). But in this case (3.2) implies S=0𝑆0S=0italic_S = 0, that is, K=0𝐾0K=0italic_K = 0. ∎

Finally, it is also possible to provide a lower bound on the radius of the largest ball around X=YYT𝑋𝑌superscript𝑌𝑇X=YY^{T}italic_X = italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT such that its intersection with r+superscriptsubscript𝑟\mathcal{M}_{r}^{+}caligraphic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT is in the image ϕ(Y+Y)italic-ϕ𝑌subscript𝑌\phi(Y+\mathcal{B}_{Y})italic_ϕ ( italic_Y + caligraphic_B start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ) (so that an inverse map ϕ1superscriptitalic-ϕ1\phi^{-1}italic_ϕ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is defined).

Proposition 3.3.

Any X~r+normal-~𝑋superscriptsubscript𝑟\tilde{X}\in\mathcal{M}_{r}^{+}over~ start_ARG italic_X end_ARG ∈ caligraphic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT satisfying X~XF<2λr(X)r+4+rsubscriptnormnormal-~𝑋𝑋𝐹2subscript𝜆𝑟𝑋𝑟4𝑟\|\tilde{X}-X\|_{F}<\frac{2\lambda_{r}(X)}{\sqrt{r+4}+\sqrt{r}}∥ over~ start_ARG italic_X end_ARG - italic_X ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT < divide start_ARG 2 italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_X ) end_ARG start_ARG square-root start_ARG italic_r + 4 end_ARG + square-root start_ARG italic_r end_ARG end_ARG is in the image ϕ(Y+Y)italic-ϕ𝑌subscript𝑌\phi(Y+\mathcal{B}_{Y})italic_ϕ ( italic_Y + caligraphic_B start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ), that is, there exists a unique HY𝐻subscript𝑌H\in\mathcal{B}_{Y}italic_H ∈ caligraphic_B start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT such that X~=(Y+H)(Y+H)Tnormal-~𝑋𝑌𝐻superscript𝑌𝐻𝑇\tilde{X}=(Y+H)(Y+H)^{T}over~ start_ARG italic_X end_ARG = ( italic_Y + italic_H ) ( italic_Y + italic_H ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT.

Observe that one could take

X~XFλr(X)r+4.subscriptnorm~𝑋𝑋𝐹subscript𝜆𝑟𝑋𝑟4\|\tilde{X}-X\|_{F}\leq\frac{\lambda_{r}(X)}{\sqrt{r+4}}.∥ over~ start_ARG italic_X end_ARG - italic_X ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ divide start_ARG italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_X ) end_ARG start_ARG square-root start_ARG italic_r + 4 end_ARG end_ARG . (3.3)

as a slightly cleaner sufficient condition in the proposition.

Proof.

Let X~=Z~Z~T~𝑋~𝑍superscript~𝑍𝑇\tilde{X}=\tilde{Z}\tilde{Z}^{T}over~ start_ARG italic_X end_ARG = over~ start_ARG italic_Z end_ARG over~ start_ARG italic_Z end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT with Z~n×r~𝑍superscript𝑛𝑟\tilde{Z}\in\mathbb{R}^{n\times r}over~ start_ARG italic_Z end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT and assume a polar decomposition of YTZ~=PQ~Tsuperscript𝑌𝑇~𝑍𝑃superscript~𝑄𝑇Y^{T}\tilde{Z}=P\tilde{Q}^{T}italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG italic_Z end_ARG = italic_P over~ start_ARG italic_Q end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, where P,Q~r×r𝑃~𝑄superscript𝑟𝑟P,\tilde{Q}\in\mathbb{R}^{r\times r}italic_P , over~ start_ARG italic_Q end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_r × italic_r end_POSTSUPERSCRIPT, P𝑃Pitalic_P is positive semidefinite, and Q~~𝑄\tilde{Q}over~ start_ARG italic_Q end_ARG is orthogonal. Let Z=Z~Q~𝑍~𝑍~𝑄Z=\tilde{Z}\tilde{Q}italic_Z = over~ start_ARG italic_Z end_ARG over~ start_ARG italic_Q end_ARG. Then

H=ZY𝐻𝑍𝑌H=Z-Yitalic_H = italic_Z - italic_Y (3.4)

satisfies (Y+H)(Y+H)T=X~𝑌𝐻superscript𝑌𝐻𝑇~𝑋(Y+H)(Y+H)^{T}=\tilde{X}( italic_Y + italic_H ) ( italic_Y + italic_H ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = over~ start_ARG italic_X end_ARG, and since YTH=PYTYsuperscript𝑌𝑇𝐻𝑃superscript𝑌𝑇𝑌Y^{T}H=P-Y^{T}Yitalic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H = italic_P - italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y is symmetric, we have HY𝐻subscript𝑌H\in\mathcal{H}_{Y}italic_H ∈ caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT. We need to show HY𝐻subscript𝑌H\in\mathcal{B}_{Y}italic_H ∈ caligraphic_B start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT, that is, HF<σr(Y)subscriptnorm𝐻𝐹subscript𝜎𝑟𝑌\|H\|_{F}<\sigma_{r}(Y)∥ italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT < italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ). Proposition 3.2 then implies that H𝐻Hitalic_H is unique in Ysubscript𝑌\mathcal{B}_{Y}caligraphic_B start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT. Let YY𝑌superscript𝑌YY^{\dagger}italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT be the orthogonal projector onto the column span of Y𝑌Yitalic_Y and Z1=YYZsubscript𝑍1𝑌superscript𝑌𝑍Z_{1}=YY^{\dagger}Zitalic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_Z. With that, we have the decomposition

HF2=YYHF2+(IYY)HF2=Z1YF2+(IYY)ZF2.superscriptsubscriptnorm𝐻𝐹2superscriptsubscriptnorm𝑌superscript𝑌𝐻𝐹2superscriptsubscriptnorm𝐼𝑌superscript𝑌𝐻𝐹2superscriptsubscriptnormsubscript𝑍1𝑌𝐹2superscriptsubscriptnorm𝐼𝑌superscript𝑌𝑍𝐹2\|H\|_{F}^{2}=\|YY^{\dagger}H\|_{F}^{2}+\|(I-YY^{\dagger})H\|_{F}^{2}=\|Z_{1}-% Y\|_{F}^{2}+\|(I-YY^{\dagger})Z\|_{F}^{2}.∥ italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ ( italic_I - italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_Y ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ ( italic_I - italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) italic_Z ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (3.5)

We estimate both terms separately. Since YTZ1=YTZ=Psuperscript𝑌𝑇subscript𝑍1superscript𝑌𝑇𝑍𝑃Y^{T}Z_{1}=Y^{T}Z=Pitalic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Z = italic_P is symmetric and positive semidefinite, the first term satisfies

Z1YF2superscriptsubscriptnormsubscript𝑍1𝑌𝐹2\displaystyle\|Z_{1}-Y\|_{F}^{2}∥ italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_Y ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =Z1F22trace(YTZ1)+YF2absentsuperscriptsubscriptnormsubscript𝑍1𝐹22tracesuperscript𝑌𝑇subscript𝑍1superscriptsubscriptnorm𝑌𝐹2\displaystyle=\|Z_{1}\|_{F}^{2}-2\operatorname{trace}(Y^{T}Z_{1})+\|Y\|_{F}^{2}= ∥ italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 roman_trace ( italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + ∥ italic_Y ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=(Z1Z1T)1/2F22i=1rσi(YTZ1)+(YYT)1/2F2.absentsuperscriptsubscriptnormsuperscriptsubscript𝑍1superscriptsubscript𝑍1𝑇12𝐹22superscriptsubscript𝑖1𝑟subscript𝜎𝑖superscript𝑌𝑇subscript𝑍1superscriptsubscriptnormsuperscript𝑌superscript𝑌𝑇12𝐹2\displaystyle=\|(Z_{1}Z_{1}^{T})^{1/2}\|_{F}^{2}-2\sum_{i=1}^{r}\sigma_{i}(Y^{% T}Z_{1})+\|(YY^{T})^{1/2}\|_{F}^{2}.= ∥ ( italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + ∥ ( italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (3.6)

A simple consideration using a singular value decomposition of Y𝑌Yitalic_Y and Z1subscript𝑍1Z_{1}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT reveals that

(YYT)1/2(Z1Z1T)1/2=U~YTZ1V~Tsuperscript𝑌superscript𝑌𝑇12superscriptsubscript𝑍1superscriptsubscript𝑍1𝑇12~𝑈superscript𝑌𝑇subscript𝑍1superscript~𝑉𝑇(YY^{T})^{1/2}(Z_{1}Z_{1}^{T})^{1/2}=\tilde{U}Y^{T}Z_{1}\tilde{V}^{T}( italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT = over~ start_ARG italic_U end_ARG italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT

for some U~~𝑈\tilde{U}over~ start_ARG italic_U end_ARG and V~~𝑉\tilde{V}over~ start_ARG italic_V end_ARG with orthonormal columns. Consequently, by von Neumann’s trace inequality (see, e.g., [32, Theorem 7.4.1.1]), we have

trace((YYT)1/2(Z1Z1T)1/2)i=1rσi(YTZ1).tracesuperscript𝑌superscript𝑌𝑇12superscriptsubscript𝑍1superscriptsubscript𝑍1𝑇12superscriptsubscript𝑖1𝑟subscript𝜎𝑖superscript𝑌𝑇subscript𝑍1\operatorname{trace}((YY^{T})^{1/2}(Z_{1}Z_{1}^{T})^{1/2})\leq\sum_{i=1}^{r}% \sigma_{i}(Y^{T}Z_{1}).roman_trace ( ( italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ) ≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) .

Inserting this into (3.1) yields

Z1YF2(Z1Z1T)1/2(YYT)1/2F2.superscriptsubscriptnormsubscript𝑍1𝑌𝐹2superscriptsubscriptnormsuperscriptsubscript𝑍1superscriptsubscript𝑍1𝑇12superscript𝑌superscript𝑌𝑇12𝐹2\|Z_{1}-Y\|_{F}^{2}\leq\|(Z_{1}Z_{1}^{T})^{1/2}-(YY^{T})^{1/2}\|_{F}^{2}.∥ italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_Y ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ ( italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT - ( italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

We remark that we could have concluded this inequality from [8, Theorem 2.7] where it is also stated. It actually holds for any Z1subscript𝑍1Z_{1}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for which YTZ1superscript𝑌𝑇subscript𝑍1Y^{T}Z_{1}italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is symmetric and positive semidefinite using the same argument (in particular for Z1subscript𝑍1Z_{1}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT replaced with the initial Z𝑍Zitalic_Z). Let now Y=UΣVT𝑌𝑈Σsuperscript𝑉𝑇Y=U\Sigma V^{T}italic_Y = italic_U roman_Σ italic_V start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT be a singular value decomposition of Y𝑌Yitalic_Y with σr(Y)subscript𝜎𝑟𝑌\sigma_{r}(Y)italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) the smallest positive singular value. Then Z1Z1T=US2UTsubscript𝑍1superscriptsubscript𝑍1𝑇𝑈superscript𝑆2superscript𝑈𝑇Z_{1}Z_{1}^{T}=US^{2}U^{T}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = italic_U italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_U start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT for some positive semidefinite S2r×rsuperscript𝑆2superscript𝑟𝑟S^{2}\in\mathbb{R}^{r\times r}italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_r × italic_r end_POSTSUPERSCRIPT and it follows from well-known results, (cf. [50]), that111For completeness we provide the proof. The matrix SΣ𝑆ΣS-\Sigmaitalic_S - roman_Σ is the unique solution to the matrix equation (M)=SM+MΣ=S2Σ2𝑀𝑆𝑀𝑀Σsuperscript𝑆2superscriptΣ2\mathcal{L}(M)=SM+M\Sigma=S^{2}-\Sigma^{2}caligraphic_L ( italic_M ) = italic_S italic_M + italic_M roman_Σ = italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_Σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Indeed, the linear operator \mathcal{L}caligraphic_L on r×rsuperscript𝑟𝑟\mathbb{R}^{r\times r}blackboard_R start_POSTSUPERSCRIPT italic_r × italic_r end_POSTSUPERSCRIPT is symmetric in the Frobenius inner product and has positive eigenvalues λi,j=λi(S)+Σjjσr(Y)subscript𝜆𝑖𝑗subscript𝜆𝑖𝑆subscriptΣ𝑗𝑗subscript𝜎𝑟𝑌\lambda_{i,j}=\lambda_{i}(S)+\Sigma_{jj}\geq\sigma_{r}(Y)italic_λ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_S ) + roman_Σ start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT ≥ italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) (the eigenvectors are rank-one matrices wiejTsubscript𝑤𝑖superscriptsubscript𝑒𝑗𝑇w_{i}e_{j}^{T}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT with wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT the eigenvectors of S𝑆Sitalic_S). Hence S2Σ2F=(SΣ)Fσr(Y)SΣFsubscriptnormsuperscript𝑆2superscriptΣ2𝐹subscriptnorm𝑆Σ𝐹subscript𝜎𝑟𝑌subscriptnorm𝑆Σ𝐹\|S^{2}-\Sigma^{2}\|_{F}=\|\mathcal{L}(S-\Sigma)\|_{F}\geq\sigma_{r}(Y)\|S-% \Sigma\|_{F}∥ italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_Σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = ∥ caligraphic_L ( italic_S - roman_Σ ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≥ italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) ∥ italic_S - roman_Σ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT.

(Z1Z1T)1/2(YYT)1/2F2=SΣF21σr(Y)2S2Σ2F2=1σr(Y)2Z1Z1TYYTF2.superscriptsubscriptnormsuperscriptsubscript𝑍1superscriptsubscript𝑍1𝑇12superscript𝑌superscript𝑌𝑇12𝐹2superscriptsubscriptnorm𝑆Σ𝐹21subscript𝜎𝑟superscript𝑌2superscriptsubscriptnormsuperscript𝑆2superscriptΣ2𝐹21subscript𝜎𝑟superscript𝑌2superscriptsubscriptnormsubscript𝑍1superscriptsubscript𝑍1𝑇𝑌superscript𝑌𝑇𝐹2\|(Z_{1}Z_{1}^{T})^{1/2}-(YY^{T})^{1/2}\|_{F}^{2}=\|S-\Sigma\|_{F}^{2}\leq% \frac{1}{\sigma_{r}(Y)^{2}}\|S^{2}-\Sigma^{2}\|_{F}^{2}=\frac{1}{\sigma_{r}(Y)% ^{2}}\|Z_{1}Z_{1}^{T}-YY^{T}\|_{F}^{2}.∥ ( italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT - ( italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ italic_S - roman_Σ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_Σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Noting that Z1Z1T=(YY)X~(YY)subscript𝑍1superscriptsubscript𝑍1𝑇𝑌superscript𝑌~𝑋𝑌superscript𝑌Z_{1}Z_{1}^{T}=(YY^{\dagger})\tilde{X}(YY^{\dagger})italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = ( italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) over~ start_ARG italic_X end_ARG ( italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) and YYT=(YY)X(YY)𝑌superscript𝑌𝑇𝑌superscript𝑌𝑋𝑌superscript𝑌YY^{T}=(YY^{\dagger})X(YY^{\dagger})italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = ( italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) italic_X ( italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) we conclude the first part with

Z1YF21σr(Y)2(YY)(X~X)(YY)F21σr(Y)2X~XF2.superscriptsubscriptnormsubscript𝑍1𝑌𝐹21subscript𝜎𝑟superscript𝑌2superscriptsubscriptnorm𝑌superscript𝑌~𝑋𝑋𝑌superscript𝑌𝐹21subscript𝜎𝑟superscript𝑌2superscriptsubscriptnorm~𝑋𝑋𝐹2\|Z_{1}-Y\|_{F}^{2}\leq\frac{1}{\sigma_{r}(Y)^{2}}\|(YY^{\dagger})(\tilde{X}-X% )(YY^{\dagger})\|_{F}^{2}\leq\frac{1}{\sigma_{r}(Y)^{2}}\|\tilde{X}-X\|_{F}^{2}.∥ italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_Y ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ ( italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) ( over~ start_ARG italic_X end_ARG - italic_X ) ( italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ over~ start_ARG italic_X end_ARG - italic_X ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (3.7)

The second term in (3.5) can be estimated as follows:

(IYY)ZF2superscriptsubscriptnorm𝐼𝑌superscript𝑌𝑍𝐹2\displaystyle\|(I-YY^{\dagger})Z\|_{F}^{2}∥ ( italic_I - italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) italic_Z ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =trace((IYY)X~(IYY))absenttrace𝐼𝑌superscript𝑌~𝑋𝐼𝑌superscript𝑌\displaystyle=\operatorname{trace}((I-YY^{\dagger})\tilde{X}(I-YY^{\dagger}))= roman_trace ( ( italic_I - italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) over~ start_ARG italic_X end_ARG ( italic_I - italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) )
r(IYY)X~(IYY)Fabsent𝑟subscriptnorm𝐼𝑌superscript𝑌~𝑋𝐼𝑌superscript𝑌𝐹\displaystyle\leq\sqrt{r}\|(I-YY^{\dagger})\tilde{X}(I-YY^{\dagger})\|_{F}≤ square-root start_ARG italic_r end_ARG ∥ ( italic_I - italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) over~ start_ARG italic_X end_ARG ( italic_I - italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT
=r(IYY)(X~X)(IYY)FrX~XF,absent𝑟subscriptnorm𝐼𝑌superscript𝑌~𝑋𝑋𝐼𝑌superscript𝑌𝐹𝑟subscriptnorm~𝑋𝑋𝐹\displaystyle=\sqrt{r}\|(I-YY^{\dagger})(\tilde{X}-X)(I-YY^{\dagger})\|_{F}% \leq\sqrt{r}\|\tilde{X}-X\|_{F},= square-root start_ARG italic_r end_ARG ∥ ( italic_I - italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) ( over~ start_ARG italic_X end_ARG - italic_X ) ( italic_I - italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ square-root start_ARG italic_r end_ARG ∥ over~ start_ARG italic_X end_ARG - italic_X ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT , (3.8)

where we used the Cauchy Schwarz inequality and the fact that (IYY)X~(IYY)𝐼𝑌superscript𝑌~𝑋𝐼𝑌superscript𝑌(I-YY^{\dagger})\tilde{X}(I-YY^{\dagger})( italic_I - italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) over~ start_ARG italic_X end_ARG ( italic_I - italic_Y italic_Y start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) has rank at most r𝑟ritalic_r.

As a result, combining (3.5) with (3.7) and (3.8), we obtain

HF21σr(Y)2X~XF2+rX~XF.superscriptsubscriptnorm𝐻𝐹21subscript𝜎𝑟superscript𝑌2superscriptsubscriptnorm~𝑋𝑋𝐹2𝑟subscriptnorm~𝑋𝑋𝐹\|H\|_{F}^{2}\leq\frac{1}{\sigma_{r}(Y)^{2}}\|\tilde{X}-X\|_{F}^{2}+\sqrt{r}\|% \tilde{X}-X\|_{F}.∥ italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ over~ start_ARG italic_X end_ARG - italic_X ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + square-root start_ARG italic_r end_ARG ∥ over~ start_ARG italic_X end_ARG - italic_X ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT . (3.9)

The right side is strictly smaller than σr(Y)2subscript𝜎𝑟superscript𝑌2\sigma_{r}(Y)^{2}italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT when

X~XF<σr(Y)2r2+σr(Y)4r4+σr(Y)4=σr(Y)22(r+4r)=2λr(X)r+4+r,subscriptnorm~𝑋𝑋𝐹subscript𝜎𝑟superscript𝑌2𝑟2subscript𝜎𝑟superscript𝑌4𝑟4subscript𝜎𝑟superscript𝑌4subscript𝜎𝑟superscript𝑌22𝑟4𝑟2subscript𝜆𝑟𝑋𝑟4𝑟\|\tilde{X}-X\|_{F}<-\frac{\sigma_{r}(Y)^{2}\sqrt{r}}{2}+\sqrt{\frac{\sigma_{r% }(Y)^{4}r}{4}+\sigma_{r}(Y)^{4}}=\frac{\sigma_{r}(Y)^{2}}{2}(\sqrt{r+4}-\sqrt{% r})=\frac{2\lambda_{r}(X)}{\sqrt{r+4}+\sqrt{r}},∥ over~ start_ARG italic_X end_ARG - italic_X ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT < - divide start_ARG italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT square-root start_ARG italic_r end_ARG end_ARG start_ARG 2 end_ARG + square-root start_ARG divide start_ARG italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT italic_r end_ARG start_ARG 4 end_ARG + italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG = divide start_ARG italic_σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ( square-root start_ARG italic_r + 4 end_ARG - square-root start_ARG italic_r end_ARG ) = divide start_ARG 2 italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_X ) end_ARG start_ARG square-root start_ARG italic_r + 4 end_ARG + square-root start_ARG italic_r end_ARG end_ARG ,

which proves the assertion. ∎

Remark 3.4.

From definition (3.4) of H𝐻Hitalic_H, since Q~~𝑄\tilde{Q}over~ start_ARG italic_Q end_ARG is given by the polar decomposition YTZ~=PQ~Tsuperscript𝑌𝑇~𝑍𝑃superscript~𝑄𝑇Y^{T}\tilde{Z}=P\tilde{Q}^{T}italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG italic_Z end_ARG = italic_P over~ start_ARG italic_Q end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, it follows that

HF=YZ~Q~F=minQ𝒪rYZ~QF,subscriptnorm𝐻𝐹subscriptnorm𝑌~𝑍~𝑄𝐹subscript𝑄subscript𝒪𝑟subscriptnorm𝑌~𝑍𝑄𝐹\|H\|_{F}=\|Y-\tilde{Z}\tilde{Q}\|_{F}=\min_{Q\in\mathcal{O}_{r}}\|Y-\tilde{Z}% Q\|_{F},∥ italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = ∥ italic_Y - over~ start_ARG italic_Z end_ARG over~ start_ARG italic_Q end_ARG ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_Q ∈ caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_Y - over~ start_ARG italic_Z end_ARG italic_Q ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ,

see, e.g., [32, section 7.4.5]. In general, given any Y,Z~n×r𝑌~𝑍superscript𝑛𝑟Y,\tilde{Z}\in\mathbb{R}^{n\times r}italic_Y , over~ start_ARG italic_Z end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT, both of rank r𝑟ritalic_r, the minimizer Z=Z~Q~𝑍~𝑍~𝑄Z=\tilde{Z}\tilde{Q}italic_Z = over~ start_ARG italic_Z end_ARG over~ start_ARG italic_Q end_ARG in this problem is necessarily obtained by choosing Q~~𝑄\tilde{Q}over~ start_ARG italic_Q end_ARG from the polar decomposition of YTZ~superscript𝑌𝑇~𝑍Y^{T}\tilde{Z}italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG italic_Z end_ARG so that YTZsuperscript𝑌𝑇𝑍Y^{T}Zitalic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Z is necessarily symmetric, that is, Z𝑍Zitalic_Z and hence ZY𝑍𝑌Z-Yitalic_Z - italic_Y are in the horizontal space Ysubscript𝑌\mathcal{H}_{Y}caligraphic_H start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT. In fact, the quantity minQ𝒪rYZ~QFsubscript𝑄subscript𝒪𝑟subscriptnorm𝑌~𝑍𝑄𝐹\min_{Q\in\mathcal{O}_{r}}\|Y-\tilde{Z}Q\|_{F}roman_min start_POSTSUBSCRIPT italic_Q ∈ caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_Y - over~ start_ARG italic_Z end_ARG italic_Q ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT defines a Riemannian distance between the orbits Y𝒪r𝑌subscript𝒪𝑟Y\mathcal{O}_{r}italic_Y caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and Z~𝒪r~𝑍subscript𝒪𝑟\tilde{Z}\mathcal{O}_{r}over~ start_ARG italic_Z end_ARG caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT in the corresponding quotient manifold; see [39, Proposition 5.1].

3.2 A time interval for the factorized problem

We now return to the factorized problem formulation (BMt𝑡titalic_t). Let Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be an optimal solution of (BMt𝑡titalic_t) at some fixed time point t𝑡titalic_t (so that YtYtT=Xtsubscript𝑌𝑡superscriptsubscript𝑌𝑡𝑇subscript𝑋𝑡Y_{t}Y_{t}^{T}=X_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and rankYt=rranksubscript𝑌𝑡𝑟\operatorname{rank}Y_{t}=rroman_rank italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_r). Based on the above propositions we are able to state a result on the allowed time interval [t,t+Δt]𝑡𝑡Δ𝑡[t,t+\Delta t][ italic_t , italic_t + roman_Δ italic_t ] for which the factorized problem (BMt𝑡titalic_t) is guaranteed to admit unique solutions on the horizontal space Ytsubscriptsubscript𝑌𝑡\mathcal{H}_{Y_{t}}caligraphic_H start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT corresponding to the original problem (SDPt𝑡titalic_t). For this, exploiting the smoothness of the curve tXtmaps-to𝑡subscript𝑋𝑡t\mapsto X_{t}italic_t ↦ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we first define

L:=maxt[0,T]X˙tF,assign𝐿subscript𝑡0𝑇subscriptnormsubscript˙𝑋𝑡𝐹L:=\max_{t\in[0,T]}\|\dot{X}_{t}\|_{F},italic_L := roman_max start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT ∥ over˙ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT , (3.10)

a uniform bound on the time derivative, as well as

λr(Xt)λ*>0subscript𝜆𝑟subscript𝑋𝑡subscript𝜆0\lambda_{r}(X_{t})\geq\lambda_{*}>0italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT > 0 (3.11)

on the smallest eigenvalue of Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, are available for t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ]. Notice that the existence of such bounds is without any further loss of generality: the existence of L𝐿Litalic_L follows from (C1), which guarantees that Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a smooth curve, while the existence of λ*subscript𝜆\lambda_{*}italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT is guaranteed by (C2), since Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT has a constant rank.

Theorem 3.5.

Let Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be a solution of (BMt𝑡titalic_t) as above. Then for Δt<2λ*L(r+4+r)normal-Δ𝑡2subscript𝜆𝐿𝑟4𝑟\Delta t<\frac{2\lambda_{*}}{L(\sqrt{r+4}+\sqrt{r})}roman_Δ italic_t < divide start_ARG 2 italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG start_ARG italic_L ( square-root start_ARG italic_r + 4 end_ARG + square-root start_ARG italic_r end_ARG ) end_ARG there is a unique and smooth solution curve sYsmaps-to𝑠subscript𝑌𝑠s\mapsto Y_{s}italic_s ↦ italic_Y start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT for the problem (BMt𝑡titalic_t) restricted to Ytsubscriptsubscript𝑌𝑡\mathcal{H}_{Y_{t}}caligraphic_H start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT in the time interval s[t,t+Δt]𝑠𝑡𝑡normal-Δ𝑡s\in[t,t+\Delta t]italic_s ∈ [ italic_t , italic_t + roman_Δ italic_t ].

Proof.

It suffices to show that for s𝑠sitalic_s in the asserted time interval the solutions Xssubscript𝑋𝑠X_{s}italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT of (SDPs𝑠{}_{s}start_FLOATSUBSCRIPT italic_s end_FLOATSUBSCRIPT) lie in the image ϕ(Yt+Yt)italic-ϕsubscript𝑌𝑡subscriptsubscript𝑌𝑡\phi(Y_{t}+\mathcal{B}_{Y_{t}})italic_ϕ ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + caligraphic_B start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ). By Proposition 3.3, this is the case if XsXtF<2λrr+4+rsubscriptnormsubscript𝑋𝑠subscript𝑋𝑡𝐹2subscript𝜆𝑟𝑟4𝑟\|X_{s}-X_{t}\|_{F}<\frac{2\lambda_{r}}{\sqrt{r+4}+\sqrt{r}}∥ italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT < divide start_ARG 2 italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_r + 4 end_ARG + square-root start_ARG italic_r end_ARG end_ARG, where λrsubscript𝜆𝑟\lambda_{r}italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is the smallest eigenvalue of Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Since

XsXtFtsX˙τF𝑑τL(st),subscriptnormsubscript𝑋𝑠subscript𝑋𝑡𝐹superscriptsubscript𝑡𝑠subscriptnormsubscript˙𝑋𝜏𝐹differential-d𝜏𝐿𝑠𝑡\|X_{s}-X_{t}\|_{F}\leq\int_{t}^{s}\|\dot{X}_{\tau}\|_{F}\;d\tau\leq L(s-t),∥ italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ ∫ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ∥ over˙ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT italic_d italic_τ ≤ italic_L ( italic_s - italic_t ) ,

and λ*λrsubscript𝜆subscript𝜆𝑟\lambda_{*}\leq\lambda_{r}italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT ≤ italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, the condition st<2λ*L(r+4+r)𝑠𝑡2subscript𝜆𝐿𝑟4𝑟s-t<\frac{2\lambda_{*}}{L(\sqrt{r+4}+\sqrt{r})}italic_s - italic_t < divide start_ARG 2 italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG start_ARG italic_L ( square-root start_ARG italic_r + 4 end_ARG + square-root start_ARG italic_r end_ARG ) end_ARG is sufficient. Then Proposition 3.2 provides the smooth solution curve Ys=ϕ1(Xs)subscript𝑌𝑠superscriptitalic-ϕ1subscript𝑋𝑠Y_{s}=\phi^{-1}(X_{s})italic_Y start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = italic_ϕ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) for problem (BMt𝑡titalic_t). ∎

The results of this section motivate the definition of a version of (BMt𝑡titalic_t) restricted to Ytsubscriptsubscript𝑌𝑡\mathcal{H}_{Y_{t}}caligraphic_H start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT, which we provide in the next section.

4 Path following the trajectory of solutions

In this section, we present a path-following procedure for computing a sequence of approximate solutions {Y^0,,Y^k,,Y^K}subscript^𝑌0subscript^𝑌𝑘subscript^𝑌𝐾\{\hat{Y}_{0},\dots,\hat{Y}_{k},\dots,\hat{Y}_{K}\}{ over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , … , over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT } at different time points that tracks a trajectory of solutions tYtmaps-to𝑡subscript𝑌𝑡t\mapsto Y_{t}italic_t ↦ italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to the Burer–Monteiro reformulation (BMt𝑡titalic_t). From this sequence we are then able to reconstruct a corresponding sequence of approximate solutions X^k=Y^kY^kTsubscript^𝑋𝑘subscript^𝑌𝑘superscriptsubscript^𝑌𝑘𝑇\hat{X}_{k}=\hat{Y}_{k}\hat{Y}_{k}^{T}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT tracking the trajectory of solutions tXtmaps-to𝑡subscript𝑋𝑡t\mapsto X_{t}italic_t ↦ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for the full space TV-SDP problem (SDPt𝑡titalic_t). The path-following method is based on iteratively solving the linearized KKT system. Given an iterate Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT on the path, we explained in the previous section how to eliminate the problem of nonuniqueness of the path in a small time interval [t,t+Δt]𝑡𝑡Δ𝑡[t,t+\Delta t][ italic_t , italic_t + roman_Δ italic_t ] by considering problem (BMt𝑡titalic_t) restricted to the horizontal space Ytsubscriptsubscript𝑌𝑡\mathcal{H}_{Y_{t}}caligraphic_H start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT. We now need to ensure that this also guarantees that the linearized KKT system admits a unique solution. We show in Theorem 4.2 that this is indeed guaranteed under standard regularity assumptions on the original problem (SDPt𝑡titalic_t). This is a remarkable fact of somewhat independent interest.

4.1 Linearized KKT conditions and second-order sufficiency

Given an optimal solution Xt=YtYtTsubscript𝑋𝑡subscript𝑌𝑡superscriptsubscript𝑌𝑡𝑇X_{t}=Y_{t}Y_{t}^{T}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT at time t𝑡titalic_t, we aim to find a solution Xt+Δt=Yt+ΔtYt+ΔtTsubscript𝑋𝑡Δ𝑡subscript𝑌𝑡Δ𝑡superscriptsubscript𝑌𝑡Δ𝑡𝑇X_{t+\Delta t}=Y_{t+\Delta t}Y_{t+\Delta t}^{T}italic_X start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT at time t+Δt𝑡Δ𝑡t+\Delta titalic_t + roman_Δ italic_t. By the results of the previous section, the next solution can be expressed in a unique way as

Yt+Δt=Yt+ΔY,subscript𝑌𝑡Δ𝑡subscript𝑌𝑡Δ𝑌Y_{t+\Delta t}=Y_{t}+\Delta Y,italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + roman_Δ italic_Y ,

where ΔYΔ𝑌\Delta Yroman_Δ italic_Y is in the horizontal space Ytsubscriptsubscript𝑌𝑡\mathcal{H}_{Y_{t}}caligraphic_H start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT, provided that ΔtΔ𝑡\Delta troman_Δ italic_t is small enough.

We define the following maps:

ft+Δt(Y)subscript𝑓𝑡Δ𝑡𝑌\displaystyle f_{t+\Delta t}(Y)italic_f start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y ) Ct+Δt,YYT,absentsubscript𝐶𝑡Δ𝑡𝑌superscript𝑌𝑇\displaystyle\coloneqq\langle C_{t+\Delta t},YY^{T}\rangle,≔ ⟨ italic_C start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⟩ , (4.1)
gt+Δt(Y)subscript𝑔𝑡Δ𝑡𝑌\displaystyle g_{t+\Delta t}(Y)italic_g start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y ) 𝒜t+Δt(YYT)bt+Δt,absentsubscript𝒜𝑡Δ𝑡𝑌superscript𝑌𝑇subscript𝑏𝑡Δ𝑡\displaystyle\coloneqq\mathcal{A}_{t+\Delta t}(YY^{T})-b_{t+\Delta t},≔ caligraphic_A start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) - italic_b start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ,
hYt(Y)subscriptsubscript𝑌𝑡𝑌\displaystyle h_{Y_{t}}(Y)italic_h start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y ) YtTYYTYt.absentsubscriptsuperscript𝑌𝑇𝑡𝑌superscript𝑌𝑇subscript𝑌𝑡\displaystyle\coloneqq Y^{T}_{t}Y-Y^{T}Y_{t}.≔ italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Y - italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT .

By definition, ΔYYtΔ𝑌subscriptsubscript𝑌𝑡\Delta Y\in\mathcal{H}_{Y_{t}}roman_Δ italic_Y ∈ caligraphic_H start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT if and only if hYt(ΔY)=0subscriptsubscript𝑌𝑡Δ𝑌0h_{Y_{t}}(\Delta Y)=0italic_h start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( roman_Δ italic_Y ) = 0. For symmetry reasons we use the equivalent condition hYt(Yt+ΔY)=0subscriptsubscript𝑌𝑡subscript𝑌𝑡Δ𝑌0h_{Y_{t}}(Y_{t}+\Delta Y)=0italic_h start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + roman_Δ italic_Y ) = 0 (which reflects the fact that Yt+Ytsubscript𝑌𝑡subscriptsubscript𝑌𝑡Y_{t}+\mathcal{H}_{Y_{t}}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + caligraphic_H start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT is actually a linear space).

To find the new iterate Yt+Δtsubscript𝑌𝑡Δ𝑡Y_{t+\Delta t}italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT we hence consider the problem

minYn×rsubscript𝑌superscript𝑛𝑟\displaystyle\min_{Y\in\mathbb{R}^{n\times r}}roman_min start_POSTSUBSCRIPT italic_Y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ft+Δt(Y)subscript𝑓𝑡Δ𝑡𝑌\displaystyle f_{t+\Delta t}(Y)italic_f start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y ) (BMYt,t+Δtsubscript𝑌𝑡𝑡Δ𝑡{}_{Y_{t},t+\Delta t}start_FLOATSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_FLOATSUBSCRIPT)
   s.t. gt+Δt(Y)=0subscript𝑔𝑡Δ𝑡𝑌0\displaystyle g_{t+\Delta t}(Y)=0italic_g start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y ) = 0
hYt(Y)=0.subscriptsubscript𝑌𝑡𝑌0\displaystyle h_{Y_{t}}(Y)=0.italic_h start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y ) = 0 .

This is a quadratically constrained quadratic problem whose Lagrangian is

Yt,t+Δt(Y,λ,μ):=ft+Δt(Y)λ,gt+Δt(Y)μ,hYt(Y)assignsubscriptsubscript𝑌𝑡𝑡Δ𝑡𝑌𝜆𝜇subscript𝑓𝑡Δ𝑡𝑌𝜆subscript𝑔𝑡Δ𝑡𝑌𝜇subscriptsubscript𝑌𝑡𝑌\mathcal{L}_{Y_{t},t+\Delta t}(Y,\lambda,\mu):=f_{t+\Delta t}(Y)-\langle% \lambda,g_{t+\Delta t}(Y)\rangle-\langle\mu,h_{Y_{t}}(Y)\ranglecaligraphic_L start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y , italic_λ , italic_μ ) := italic_f start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y ) - ⟨ italic_λ , italic_g start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y ) ⟩ - ⟨ italic_μ , italic_h start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y ) ⟩ (4.2)

with multipliers λm𝜆superscript𝑚\lambda\in\mathbb{R}^{m}italic_λ ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT and μ𝕊skewr𝜇subscriptsuperscript𝕊𝑟𝑠𝑘𝑒𝑤\mu\in\mathbb{S}^{r}_{skew}italic_μ ∈ blackboard_S start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s italic_k italic_e italic_w end_POSTSUBSCRIPT. The KKT conditions of problem (BMYt,t+Δtsubscript𝑌𝑡𝑡Δ𝑡{}_{Y_{t},t+\Delta t}start_FLOATSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_FLOATSUBSCRIPT) are

YYt,t+Δt(Y,λ,μ)=0subscript𝑌subscriptsubscript𝑌𝑡𝑡Δ𝑡𝑌𝜆𝜇0\displaystyle\nabla_{Y}\mathcal{L}_{Y_{t},t+\Delta t}(Y,\lambda,\mu)=0∇ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y , italic_λ , italic_μ ) = 0 (4.3)
gt+Δt(Y)=0subscript𝑔𝑡Δ𝑡𝑌0\displaystyle g_{t+\Delta t}(Y)=0italic_g start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y ) = 0
hYt(Y)=0.subscriptsubscript𝑌𝑡𝑌0\displaystyle h_{Y_{t}}(Y)=0.italic_h start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y ) = 0 .

Hence, (4.3) reads explicitly as

Yt,t+Δt(Y,λ,μ)[2Ct+ΔtY2𝒜t+Δt*(λ)Y2Ytμ𝒜t+Δt(YYT)bt+ΔtYtTYYTYt]=0.subscriptsubscript𝑌𝑡𝑡Δ𝑡𝑌𝜆𝜇matrix2subscript𝐶𝑡Δ𝑡𝑌2subscriptsuperscript𝒜𝑡Δ𝑡𝜆𝑌2subscript𝑌𝑡𝜇subscript𝒜𝑡Δ𝑡𝑌superscript𝑌𝑇subscript𝑏𝑡Δ𝑡superscriptsubscript𝑌𝑡𝑇𝑌superscript𝑌𝑇subscript𝑌𝑡0\mathcal{F}_{Y_{t},t+\Delta t}(Y,\lambda,\mu)\coloneqq\begin{bmatrix}2C_{t+% \Delta t}Y-2\mathcal{A}^{*}_{t+\Delta t}(\lambda)Y-2Y_{t}\mu\\ \mathcal{A}_{t+\Delta t}(YY^{T})-b_{t+\Delta t}\\ Y_{t}^{T}Y-Y^{T}Y_{t}\end{bmatrix}=0.caligraphic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y , italic_λ , italic_μ ) ≔ [ start_ARG start_ROW start_CELL 2 italic_C start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT italic_Y - 2 caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_λ ) italic_Y - 2 italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_μ end_CELL end_ROW start_ROW start_CELL caligraphic_A start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) - italic_b start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y - italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = 0 .

The linearization of (4.3) at (Yt,λt,μt)subscript𝑌𝑡subscript𝜆𝑡subscript𝜇𝑡(Y_{t},\lambda_{t},\mu_{t})( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) leads to a linear system

𝒥Yt,t+Δt(Yt,λt,μt)[ΔYλt+Δλμt+Δμ]=[Yft+Δt(Yt)gt+Δt(Yt)0],subscript𝒥subscript𝑌𝑡𝑡Δ𝑡subscript𝑌𝑡subscript𝜆𝑡subscript𝜇𝑡matrixΔ𝑌subscript𝜆𝑡Δ𝜆subscript𝜇𝑡Δ𝜇matrixsubscript𝑌subscript𝑓𝑡Δ𝑡subscript𝑌𝑡subscript𝑔𝑡Δ𝑡subscript𝑌𝑡0\mathcal{J}_{Y_{t},t+\Delta t}(Y_{t},\lambda_{t},\mu_{t})\begin{bmatrix}\Delta Y% \\ \lambda_{t}+\Delta\lambda\\ \mu_{t}+\Delta\mu\end{bmatrix}=\begin{bmatrix}-\nabla_{Y}f_{t+\Delta t}(Y_{t})% \\ g_{t+\Delta t}(Y_{t})\\ 0\end{bmatrix},caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) [ start_ARG start_ROW start_CELL roman_Δ italic_Y end_CELL end_ROW start_ROW start_CELL italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + roman_Δ italic_λ end_CELL end_ROW start_ROW start_CELL italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + roman_Δ italic_μ end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL - ∇ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_g start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ] , (4.4)

where 𝒥Yt,t+Δt(Y,λ,μ)subscript𝒥subscript𝑌𝑡𝑡Δ𝑡𝑌𝜆𝜇\mathcal{J}_{Y_{t},t+\Delta t}(Y,\lambda,\mu)caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y , italic_λ , italic_μ ) denotes the derivative of Yt,t+Δtsubscriptsubscript𝑌𝑡𝑡Δ𝑡\mathcal{F}_{Y_{t},t+\Delta t}caligraphic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT at (Y,λ,μ)𝑌𝜆𝜇(Y,\lambda,\mu)( italic_Y , italic_λ , italic_μ ). Note that it actually does not depend on μ𝜇\muitalic_μ, but we will keep this notation for consistency. As a linear operator on n×r×m×𝕊skewrsuperscript𝑛𝑟superscript𝑚subscriptsuperscript𝕊𝑟𝑠𝑘𝑒𝑤\mathbb{R}^{n\times r}\times\mathbb{R}^{m}\times\mathbb{S}^{r}_{skew}blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT × blackboard_S start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s italic_k italic_e italic_w end_POSTSUBSCRIPT, 𝒥Yt,t+Δt(Y,λ,μ)subscript𝒥subscript𝑌𝑡𝑡Δ𝑡𝑌𝜆𝜇\mathcal{J}_{Y_{t},t+\Delta t}(Y,\lambda,\mu)caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y , italic_λ , italic_μ ) can be written in block matrix notation as follows,

𝒥Yt,t+Δt(Y,λ,μ):=[Y2Yt,t+Δt(λ)gt+Δt(Y)*hYt*gt+Δt(Y)00hYt00],assignsubscript𝒥subscript𝑌𝑡𝑡Δ𝑡𝑌𝜆𝜇matrixsubscriptsuperscript2𝑌subscriptsubscript𝑌𝑡𝑡Δ𝑡𝜆missing-subexpressionsubscriptsuperscript𝑔𝑡Δ𝑡superscript𝑌missing-subexpressionsuperscriptsubscriptsubscript𝑌𝑡subscriptsuperscript𝑔𝑡Δ𝑡𝑌missing-subexpression0missing-subexpression0subscriptsubscript𝑌𝑡missing-subexpression0missing-subexpression0\mathcal{J}_{Y_{t},t+\Delta t}(Y,\lambda,\mu):=\begin{bmatrix}\nabla^{2}_{Y}% \mathcal{L}_{Y_{t},t+\Delta t}(\lambda)&&-g^{\prime}_{t+\Delta t}(Y)^{*}&&-h_{% Y_{t}}^{*}\\ -g^{\prime}_{t+\Delta t}(Y)&&0&&0\\ -h_{Y_{t}}&&0&&0\end{bmatrix},caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y , italic_λ , italic_μ ) := [ start_ARG start_ROW start_CELL ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_λ ) end_CELL start_CELL end_CELL start_CELL - italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_CELL start_CELL end_CELL start_CELL - italic_h start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL - italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y ) end_CELL start_CELL end_CELL start_CELL 0 end_CELL start_CELL end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL - italic_h start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL end_CELL start_CELL 0 end_CELL start_CELL end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] ,

where from (4.1) and (4.2) one derives

Y2Yt,t+Δtsubscriptsuperscript2𝑌subscriptsubscript𝑌𝑡𝑡Δ𝑡\displaystyle\nabla^{2}_{Y}\mathcal{L}_{Y_{t},t+\Delta t}∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT :H2(Ct+Δt𝒜t+Δt*(λ))H,:absentmaps-to𝐻2subscript𝐶𝑡Δ𝑡subscriptsuperscript𝒜𝑡Δ𝑡𝜆𝐻\displaystyle:H\mapsto 2(C_{t+\Delta t}-\mathcal{A}^{*}_{t+\Delta t}(\lambda))H,: italic_H ↦ 2 ( italic_C start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_λ ) ) italic_H ,
gt+Δt(Y)subscriptsuperscript𝑔𝑡Δ𝑡𝑌\displaystyle g^{\prime}_{t+\Delta t}(Y)italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y ) :H𝒜t+Δt(YHT+HYT),:absentmaps-to𝐻subscript𝒜𝑡Δ𝑡𝑌superscript𝐻𝑇𝐻superscript𝑌𝑇\displaystyle:H\mapsto\mathcal{A}_{t+\Delta t}(YH^{T}+HY^{T}),: italic_H ↦ caligraphic_A start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_H italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) ,
hYtsubscriptsubscript𝑌𝑡\displaystyle h_{Y_{t}}italic_h start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT :HYtTHHTYt,:absentmaps-to𝐻superscriptsubscript𝑌𝑡𝑇𝐻superscript𝐻𝑇subscript𝑌𝑡\displaystyle:H\mapsto Y_{t}^{T}H-H^{T}Y_{t},: italic_H ↦ italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H - italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,
gt+Δt(Y)*subscriptsuperscript𝑔𝑡Δ𝑡superscript𝑌\displaystyle g^{\prime}_{t+\Delta t}(Y)^{*}italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y ) start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT :λ2𝒜t+Δt*(λ)Y,:absentmaps-to𝜆2subscriptsuperscript𝒜𝑡Δ𝑡𝜆𝑌\displaystyle:\lambda\mapsto 2\mathcal{A}^{*}_{t+\Delta t}(\lambda)Y,: italic_λ ↦ 2 caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_λ ) italic_Y ,
hYt*subscriptsuperscriptsubscript𝑌𝑡\displaystyle h^{*}_{Y_{t}}italic_h start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT :μ2Ytμ.:absentmaps-to𝜇2subscript𝑌𝑡𝜇\displaystyle:\mu\mapsto 2Y_{t}\mu.: italic_μ ↦ 2 italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_μ .

For later reference, observe that as a bilinear form Y2Yt,t+Δtsubscriptsuperscript2𝑌subscriptsubscript𝑌𝑡𝑡Δ𝑡\nabla^{2}_{Y}\mathcal{L}_{Y_{t},t+\Delta t}∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT reads

Y2Yt,t+Δt(λ)[H,H]=2trace(HT(Ct+Δt𝒜t+Δt*(λ))H).subscriptsuperscript2𝑌subscriptsubscript𝑌𝑡𝑡Δ𝑡𝜆𝐻𝐻2tracesuperscript𝐻𝑇subscript𝐶𝑡Δ𝑡subscriptsuperscript𝒜𝑡Δ𝑡𝜆𝐻\nabla^{2}_{Y}\mathcal{L}_{Y_{t},t+\Delta t}(\lambda)[H,H]=2\operatorname{% trace}(H^{T}(C_{t+\Delta t}-\mathcal{A}^{*}_{t+\Delta t}(\lambda))H).∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_λ ) [ italic_H , italic_H ] = 2 roman_trace ( italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_C start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_λ ) ) italic_H ) .

Solving (4.4) for obtaining updates (Yt+ΔY,λt+Δλ,μt+Δμ)subscript𝑌𝑡Δ𝑌subscript𝜆𝑡Δ𝜆subscript𝜇𝑡Δ𝜇(Y_{t}+\Delta Y,\lambda_{t}+\Delta\lambda,\mu_{t}+\Delta\mu)( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + roman_Δ italic_Y , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + roman_Δ italic_λ , italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + roman_Δ italic_μ ) is equivalent to applying one step of Newton’s method to the KKT system (4.3) (Lagrange–Newton method).

Our aim in this subsection is to show that for ΔtΔ𝑡\Delta troman_Δ italic_t small enough the system (4.4) is uniquely solvable when (Yt,λt)subscript𝑌𝑡subscript𝜆𝑡(Y_{t},\lambda_{t})( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is a KKT-pair for the overparametrized problem (BMt𝑡titalic_t). Since the system is continuous in ΔtΔ𝑡\Delta troman_Δ italic_t, we can do that by showing that it admits a unique solution for Δt=0Δ𝑡0\Delta t=0roman_Δ italic_t = 0. This corresponds to proving second-order sufficient conditions for the optimality of problem (BMYt,t+Δtsubscript𝑌𝑡𝑡Δ𝑡{}_{Y_{t},t+\Delta t}start_FLOATSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_FLOATSUBSCRIPT) for Δt=0Δ𝑡0\Delta t=0roman_Δ italic_t = 0. Interestingly, it is possible to relate this to standard regularity hypotheses on the original semidefinite problem (SDPt𝑡titalic_t). For this we first need a uniqueness statement on the Lagrange multiplier λtsubscript𝜆𝑡\lambda_{t}italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Lemma 4.1.

Given an optimal solution Xt=YtYtTsubscript𝑋𝑡subscript𝑌𝑡superscriptsubscript𝑌𝑡𝑇X_{t}=Y_{t}Y_{t}^{T}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT to (SDPt𝑡titalic_t), suppose that Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a unique (see consequence (C1)), primal nondegenerate (see Definition 2.3 and assumption (A3)) solution. Then there is a unique optimal Lagrangian multiplier λtsubscript𝜆𝑡\lambda_{t}italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for (BMt𝑡titalic_t) independent of the choice of Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in the orbit Yt𝒪rsubscript𝑌𝑡subscript𝒪𝑟Y_{t}\mathcal{O}_{r}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. Moreover, Z(λt)=Ct𝒜t*(λt)𝑍subscript𝜆𝑡subscript𝐶𝑡superscriptsubscript𝒜𝑡subscript𝜆𝑡Z(\lambda_{t})=C_{t}-\mathcal{A}_{t}^{*}(\lambda_{t})italic_Z ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is the unique dual solution to (D-SDPt𝑡titalic_t).

Proof.

We start by recalling that the optimal set for (BMt𝑡titalic_t) coincides with Yt𝒪rsubscript𝑌𝑡subscript𝒪𝑟Y_{t}\mathcal{O}_{r}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. Since the KKT conditions for (BMt𝑡titalic_t) are just

Yft(Y)Yλ,gt(Y)=2(Ct𝒜t*(λ))Y=0subscript𝑌subscript𝑓𝑡𝑌subscript𝑌𝜆subscript𝑔𝑡𝑌2subscript𝐶𝑡superscriptsubscript𝒜𝑡𝜆𝑌0\nabla_{Y}f_{t}(Y)-\nabla_{Y}\langle\lambda,g_{t}(Y)\rangle=2(C_{t}-\mathcal{A% }_{t}^{*}(\lambda))Y=0∇ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y ) - ∇ start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ⟨ italic_λ , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y ) ⟩ = 2 ( italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_λ ) ) italic_Y = 0

(and gt(Y)=0subscript𝑔𝑡𝑌0g_{t}(Y)=0italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y ) = 0), the set of all optimal dual multipliers for (BMt𝑡titalic_t) is

{λ:(Ct𝒜t*(λ))YtQ=0,Q𝒪r}={λ:(Ct𝒜t*(λ))Yt=0}conditional-set𝜆formulae-sequencesubscript𝐶𝑡superscriptsubscript𝒜𝑡𝜆subscript𝑌𝑡𝑄0𝑄subscript𝒪𝑟conditional-set𝜆subscript𝐶𝑡superscriptsubscript𝒜𝑡𝜆subscript𝑌𝑡0\{\lambda\colon(C_{t}-\mathcal{A}_{t}^{*}(\lambda))Y_{t}Q=0,Q\in\mathcal{O}_{r% }\}=\{\lambda\colon(C_{t}-\mathcal{A}_{t}^{*}(\lambda))Y_{t}=0\}{ italic_λ : ( italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_λ ) ) italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Q = 0 , italic_Q ∈ caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT } = { italic_λ : ( italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_λ ) ) italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 }

To show that this set is a singleton, it suffices to prove that the homogeneous equation 𝒜t*(λ)Yt=0subscriptsuperscript𝒜𝑡𝜆subscript𝑌𝑡0\mathcal{A}^{*}_{t}(\lambda)Y_{t}=0caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_λ ) italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 has only the zero solution. By (2.2), primal nondegeneracy for Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can read as

im𝒜t*𝒯Xt={0},imsubscriptsuperscript𝒜𝑡subscriptsuperscript𝒯perpendicular-tosubscript𝑋𝑡0\operatorname{im}\mathcal{A}^{*}_{t}\cap\mathcal{T}^{\perp}_{X_{t}}=\{0\},roman_im caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∩ caligraphic_T start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { 0 } ,

where 𝒯Xt={M𝕊nMXt=0}subscriptsuperscript𝒯perpendicular-tosubscript𝑋𝑡conditional-set𝑀superscript𝕊𝑛𝑀subscript𝑋𝑡0\mathcal{T}^{\perp}_{X_{t}}=\{M\in\mathbb{S}^{n}\mid MX_{t}=0\}caligraphic_T start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { italic_M ∈ blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∣ italic_M italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 }. Noticing that 𝒜t*(λ)Yt=0subscriptsuperscript𝒜𝑡𝜆subscript𝑌𝑡0\mathcal{A}^{*}_{t}(\lambda)Y_{t}=0caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_λ ) italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 implies At*(λ)im(𝒜t*)𝒯Xtsubscriptsuperscript𝐴𝑡𝜆imsubscriptsuperscript𝒜𝑡subscriptsuperscript𝒯perpendicular-tosubscript𝑋𝑡A^{*}_{t}(\lambda)\in\operatorname{im}(\mathcal{A}^{*}_{t})\cap\mathcal{T}^{% \perp}_{X_{t}}italic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_λ ) ∈ roman_im ( caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∩ caligraphic_T start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT, we get that 𝒜t*(λ)=0subscriptsuperscript𝒜𝑡𝜆0\mathcal{A}^{*}_{t}(\lambda)=0caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_λ ) = 0 and thus λ=0𝜆0\lambda=0italic_λ = 0 since 𝒜t*superscriptsubscript𝒜𝑡\mathcal{A}_{t}^{*}caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is injective by assumption (A2). To prove the second statement, observe that by primal nondegeneracy (D-SDPt𝑡titalic_t) has a unique solution Z(wt)𝑍subscript𝑤𝑡Z(w_{t})italic_Z ( italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) corresponding, by assumption (A2), to a unique dual multipliers vector wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (see Theorem 7 in [4]). Furthermore, Z(wt)𝑍subscript𝑤𝑡Z(w_{t})italic_Z ( italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) satisfies Z(wt)Xt=(Ct𝒜t*(wt))YtYtT=0𝑍subscript𝑤𝑡subscript𝑋𝑡subscript𝐶𝑡subscriptsuperscript𝒜𝑡subscript𝑤𝑡subscript𝑌𝑡superscriptsubscript𝑌𝑡𝑇0Z(w_{t})X_{t}=\left(C_{t}-\mathcal{A}^{*}_{t}(w_{t})\right)Y_{t}Y_{t}^{T}=0italic_Z ( italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = 0 by (2.1). Since Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT has full column rank if r𝑟ritalic_r is chosen equal to r*=rankXtsuperscript𝑟ranksubscript𝑋𝑡r^{*}=\operatorname{rank}X_{t}italic_r start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = roman_rank italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, this implies that (Ct𝒜t*(wt))Yt=0subscript𝐶𝑡subscriptsuperscript𝒜𝑡subscript𝑤𝑡subscript𝑌𝑡0\left(C_{t}-\mathcal{A}^{*}_{t}(w_{t})\right)Y_{t}=0( italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0. From the first statement it then follows that wt=λtsubscript𝑤𝑡subscript𝜆𝑡w_{t}=\lambda_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. ∎

We can now state and prove the main result of this subsection.

Theorem 4.2.

Let (Xt=YtYtT,Zt)subscript𝑋𝑡subscript𝑌𝑡superscriptsubscript𝑌𝑡𝑇subscript𝑍𝑡(X_{t}=Y_{t}Y_{t}^{T},Z_{t})( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) be a strictly complementary (see Definition 2.2) optimal primal-dual pair of solutions to (SDPt𝑡titalic_t)-(D-SDPt𝑡titalic_t) such that Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a primal nondegenerate solution. Let λtsubscript𝜆𝑡\lambda_{t}italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be the unique corresponding Lagrange multiplier for (BMt𝑡titalic_t) according to Lemma 4.1. Then the triple (Yt,λt,μt=0)subscript𝑌𝑡subscript𝜆𝑡subscript𝜇𝑡0(Y_{t},\lambda_{t},\mu_{t}=0)( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 ) is a KKT triple for (BMYt,t+Δtsubscript𝑌𝑡𝑡𝛥𝑡{}_{Y_{t},t+\Delta t}start_FLOATSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_FLOATSUBSCRIPT) at Δt=0normal-Δ𝑡0\Delta t=0roman_Δ italic_t = 0 (that is, Yt,t(Yt,λt,0)=0subscriptsubscript𝑌𝑡𝑡subscript𝑌𝑡subscript𝜆𝑡00\mathcal{F}_{Y_{t},t}(Y_{t},\lambda_{t},0)=0caligraphic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ) = 0) and fulfills the second-order sufficient conditions:

Y2Yt,t(λt)[H,H]=trace(HT(Ct𝒜t*(λt))H)>0subscriptsuperscript2𝑌subscriptsubscript𝑌𝑡𝑡subscript𝜆𝑡𝐻𝐻tracesuperscript𝐻𝑇subscript𝐶𝑡subscriptsuperscript𝒜𝑡subscript𝜆𝑡𝐻0\nabla^{2}_{Y}\mathcal{L}_{Y_{t},t}(\lambda_{t})[H,H]=\operatorname{trace}(H^{% T}(C_{t}-\mathcal{A}^{*}_{t}(\lambda_{t}))H)>0∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) [ italic_H , italic_H ] = roman_trace ( italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) italic_H ) > 0 (4.5)

for all Hn×r{0}𝐻superscript𝑛𝑟0H\in\mathbb{R}^{n\times r}\setminus\{0\}italic_H ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT ∖ { 0 } satisfying 𝒜t(YtHT+HYtT)=0subscript𝒜𝑡subscript𝑌𝑡superscript𝐻𝑇𝐻superscriptsubscript𝑌𝑡𝑇0\mathcal{A}_{t}(Y_{t}H^{T}+HY_{t}^{T})=0caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_H italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) = 0 and YtTHHYtT=0superscriptsubscript𝑌𝑡𝑇𝐻𝐻superscriptsubscript𝑌𝑡𝑇0Y_{t}^{T}H-HY_{t}^{T}=0italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H - italic_H italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = 0. In particular, 𝒥Yt,t(Yt,λt,0)subscript𝒥subscript𝑌𝑡𝑡subscript𝑌𝑡subscript𝜆𝑡0\mathcal{J}_{Y_{t},t}(Y_{t},\lambda_{t},0)caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ) is invertible.

Proof.

Since (Ct𝒜*(λt))Yt=Z(λt)Yt=0subscript𝐶𝑡superscript𝒜subscript𝜆𝑡subscript𝑌𝑡𝑍subscript𝜆𝑡subscript𝑌𝑡0(C_{t}-\mathcal{A}^{*}(\lambda_{t}))Y_{t}=Z(\lambda_{t})Y_{t}=0( italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_Z ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 by the KKT conditions for (BMt𝑡titalic_t) and hYt(Yt)=0subscriptsubscript𝑌𝑡subscript𝑌𝑡0h_{Y_{t}}(Y_{t})=0italic_h start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = 0, it is obvious that Yt,t(Yt,λt,0)=0subscriptsubscript𝑌𝑡𝑡subscript𝑌𝑡subscript𝜆𝑡00\mathcal{F}_{Y_{t},t}(Y_{t},\lambda_{t},0)=0caligraphic_F start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ) = 0. It is well-known that the linearized KKT system (4.4) admits a unique solution if (and only if) the second-order sufficient conditions (4.5) hold; see e.g., [43, Lemma 16.1]. Since (Xt,Z(λt))subscript𝑋𝑡𝑍subscript𝜆𝑡(X_{t},Z(\lambda_{t}))( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Z ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) is an optimal solution for the original primal-dual pair of SDPs, and it hence satisifies the second-order necessary conditions for optimality (that is, Z(λt)0succeeds-or-equals𝑍subscript𝜆𝑡0Z(\lambda_{t})\succeq 0italic_Z ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ⪰ 0), (4.5) holds with “\geq”. Assume that

trace(HT(Ct𝒜t*(λt))H)=trace(HTZ(λt)H)=0tracesuperscript𝐻𝑇subscript𝐶𝑡superscriptsubscript𝒜𝑡subscript𝜆𝑡𝐻tracesuperscript𝐻𝑇𝑍subscript𝜆𝑡𝐻0\operatorname{trace}(H^{T}(C_{t}-\mathcal{A}_{t}^{*}(\lambda_{t}))H)=% \operatorname{trace}(H^{T}Z(\lambda_{t})H)=0roman_trace ( italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) italic_H ) = roman_trace ( italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Z ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_H ) = 0

for some Hn×r𝐻superscript𝑛𝑟H\in\mathbb{R}^{n\times r}italic_H ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT satisfying 𝒜t(YtHT+HYtT)=0subscript𝒜𝑡subscript𝑌𝑡superscript𝐻𝑇𝐻superscriptsubscript𝑌𝑡𝑇0\mathcal{A}_{t}(Y_{t}H^{T}+HY_{t}^{T})=0caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_H italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) = 0 and YtTHHTYt=0superscriptsubscript𝑌𝑡𝑇𝐻superscript𝐻𝑇subscript𝑌𝑡0Y_{t}^{T}H-H^{T}Y_{t}=0italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H - italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0. Since Zt=Z(λt)subscript𝑍𝑡𝑍subscript𝜆𝑡Z_{t}=Z(\lambda_{t})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_Z ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is positive semidefinite, the columns of H𝐻Hitalic_H must belong to the kernel of Z(λt)𝑍subscript𝜆𝑡Z(\lambda_{t})italic_Z ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). By strict complementarity they hence belong to the column space of Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, which is equal to the column space of Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Therefore H=YtP𝐻subscript𝑌𝑡𝑃H=Y_{t}Pitalic_H = italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_P for some matrix Pr×r𝑃superscript𝑟𝑟P\in\mathbb{R}^{r\times r}italic_P ∈ blackboard_R start_POSTSUPERSCRIPT italic_r × italic_r end_POSTSUPERSCRIPT. Consider now the matrix

X~=Xt+s(YtHT+HYtT)=Yt[Ir+s(PT+P)]YtT,~𝑋subscript𝑋𝑡𝑠subscript𝑌𝑡superscript𝐻𝑇𝐻superscriptsubscript𝑌𝑡𝑇subscript𝑌𝑡delimited-[]subscript𝐼𝑟𝑠superscript𝑃𝑇𝑃superscriptsubscript𝑌𝑡𝑇\tilde{X}=X_{t}+s(Y_{t}H^{T}+HY_{t}^{T})=Y_{t}[I_{r}+s(P^{T}+P)]Y_{t}^{T},over~ start_ARG italic_X end_ARG = italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_s ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_H italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) = italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT [ italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + italic_s ( italic_P start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_P ) ] italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ,

depending on a real parameter s𝑠sitalic_s. Clearly, 𝒜t(X~)=btsubscript𝒜𝑡~𝑋subscript𝑏𝑡\mathcal{A}_{t}(\tilde{X})=b_{t}caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_X end_ARG ) = italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and, for nonzero |s|𝑠|s|| italic_s | small enough, X~~𝑋\tilde{X}over~ start_ARG italic_X end_ARG is positive semidefinite. Furthermore, for a suitable choice of the sign of s𝑠sitalic_s, we have Ct,X~Ct,Xtsubscript𝐶𝑡~𝑋subscript𝐶𝑡subscript𝑋𝑡\langle C_{t},\tilde{X}\rangle\leq\langle C_{t},X_{t}\rangle⟨ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over~ start_ARG italic_X end_ARG ⟩ ≤ ⟨ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩. Since Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the unique solution of (SDPt𝑡titalic_t), this implies X~=Xt~𝑋subscript𝑋𝑡\tilde{X}=X_{t}over~ start_ARG italic_X end_ARG = italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and thus YtHT+HYtTsubscript𝑌𝑡superscript𝐻𝑇𝐻superscriptsubscript𝑌𝑡𝑇Y_{t}H^{T}+HY_{t}^{T}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_H italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT must be zero. Since HYt𝐻subscriptsubscript𝑌𝑡H\in\mathcal{H}_{Y_{t}}italic_H ∈ caligraphic_H start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT Proposition 3.1 yields H=0𝐻0H=0italic_H = 0, and this completes the proof. ∎

Corollary 4.3.

Let the assumptions of Theorem 4.2 be satisfied. Then for Δt>0normal-Δ𝑡0\Delta t>0roman_Δ italic_t > 0 small enough (and depending on Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT) system (4.4), that is, operator 𝒥Yt,t+Δt(Yt,λt,0)subscript𝒥subscript𝑌𝑡𝑡normal-Δ𝑡subscript𝑌𝑡subscript𝜆𝑡0\mathcal{J}_{Y_{t},t+\Delta t}(Y_{t},\lambda_{t},0)caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ), is invertible.

Clearly, this is only a qualitative result. An upper bound for feasible ΔtΔ𝑡\Delta troman_Δ italic_t could be expressed in terms of the spectral norm of the inverse of 𝒥Yt,t(Yt,λt,0)subscript𝒥subscript𝑌𝑡𝑡subscript𝑌𝑡subscript𝜆𝑡0\mathcal{J}_{Y_{t},t}(Y_{t},\lambda_{t},0)caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ) using perturbation arguments. This would require a lower bound on the absolute value of the eigenvalues of 𝒥Yt,t(Yt,λt,0)subscript𝒥subscript𝑌𝑡𝑡subscript𝑌𝑡subscript𝜆𝑡0\mathcal{J}_{Y_{t},t}(Y_{t},\lambda_{t},0)caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ). In this context, we should clarify that the eigenvalues, and hence also the condition number of 𝒥Yt,t+Δt(Yt,λt,0)subscript𝒥subscript𝑌𝑡𝑡Δ𝑡subscript𝑌𝑡subscript𝜆𝑡0\mathcal{J}_{Y_{t},t+\Delta t}(Y_{t},\lambda_{t},0)caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ) (for sufficiently small ΔtΔ𝑡\Delta troman_Δ italic_t as above), do not depend on the particular choice of Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in the orbit Yt𝒪rsubscript𝑌𝑡subscript𝒪𝑟Y_{t}\mathcal{O}_{r}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. This is obviously also relevant from a practical perspective. To see this, note that as a bilinear form (on n×r×m×𝕊skewrsuperscript𝑛𝑟superscript𝑚subscriptsuperscript𝕊𝑟𝑠𝑘𝑒𝑤\mathbb{R}^{n\times r}\times\mathbb{R}^{m}\times\mathbb{S}^{r}_{skew}blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT × blackboard_S start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s italic_k italic_e italic_w end_POSTSUBSCRIPT) 𝒥Yt,t+Δt(Y,λ,μ)subscript𝒥subscript𝑌𝑡𝑡Δ𝑡𝑌𝜆𝜇\mathcal{J}_{Y_{t},t+\Delta t}(Y,\lambda,\mu)caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y , italic_λ , italic_μ ) reads

𝒥Yt,t+Δt(Y,λ,μ)[(H,Δλ,Δμ),(H,Δλ,Δμ)]subscript𝒥subscript𝑌𝑡𝑡Δ𝑡𝑌𝜆𝜇𝐻Δ𝜆Δ𝜇𝐻Δ𝜆Δ𝜇\displaystyle\mathcal{J}_{Y_{t},t+\Delta t}(Y,\lambda,\mu)[(H,\Delta\lambda,% \Delta\mu),(H,\Delta\lambda,\Delta\mu)]caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y , italic_λ , italic_μ ) [ ( italic_H , roman_Δ italic_λ , roman_Δ italic_μ ) , ( italic_H , roman_Δ italic_λ , roman_Δ italic_μ ) ]
=\displaystyle{}={}= trace(HT(Ct+Δt𝒜t+Δt*(λ))H)2Δλ,𝒜t+Δt(YHT+HYT)2Δμ,YtTHHTYt.tracesuperscript𝐻𝑇subscript𝐶𝑡Δ𝑡subscriptsuperscript𝒜𝑡Δ𝑡𝜆𝐻2Δ𝜆subscript𝒜𝑡Δ𝑡𝑌superscript𝐻𝑇𝐻superscript𝑌𝑇2Δ𝜇superscriptsubscript𝑌𝑡𝑇𝐻superscript𝐻𝑇subscript𝑌𝑡\displaystyle\operatorname{trace}(H^{T}(C_{t+\Delta t}-\mathcal{A}^{*}_{t+% \Delta t}(\lambda))H)-2\langle\Delta\lambda,\mathcal{A}_{t+\Delta t}(YH^{T}+HY% ^{T})\rangle-2\langle\Delta\mu,Y_{t}^{T}H-H^{T}Y_{t}\rangle.roman_trace ( italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_C start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_λ ) ) italic_H ) - 2 ⟨ roman_Δ italic_λ , caligraphic_A start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_H italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) ⟩ - 2 ⟨ roman_Δ italic_μ , italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_H - italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ .

For any fixed Q𝒪r𝑄subscript𝒪𝑟Q\in\mathcal{O}_{r}italic_Q ∈ caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT one therefore has

𝒥Yt,t+Δt(Yt,λt,0)[(H,Δλ,Δμ),(H,Δλ,Δμ)]subscript𝒥subscript𝑌𝑡𝑡Δ𝑡subscript𝑌𝑡subscript𝜆𝑡0𝐻Δ𝜆Δ𝜇𝐻Δ𝜆Δ𝜇\displaystyle\mathcal{J}_{Y_{t},t+\Delta t}(Y_{t},\lambda_{t},0)[(H,\Delta% \lambda,\Delta\mu),(H,\Delta\lambda,\Delta\mu)]caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ) [ ( italic_H , roman_Δ italic_λ , roman_Δ italic_μ ) , ( italic_H , roman_Δ italic_λ , roman_Δ italic_μ ) ]
=\displaystyle{}={}= 𝒥YtQ,t+Δt(YtQ,λt,0)[𝒯Q(H,Δλ,Δμ),𝒯Q(H,Δλ,Δμ)]subscript𝒥subscript𝑌𝑡𝑄𝑡Δ𝑡subscript𝑌𝑡𝑄subscript𝜆𝑡0subscript𝒯𝑄𝐻Δ𝜆Δ𝜇subscript𝒯𝑄𝐻Δ𝜆Δ𝜇\displaystyle\mathcal{J}_{Y_{t}Q,t+\Delta t}(Y_{t}Q,\lambda_{t},0)[\mathcal{T}% _{Q}(H,\Delta\lambda,\Delta\mu),\mathcal{T}_{Q}(H,\Delta\lambda,\Delta\mu)]caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Q , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Q , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ) [ caligraphic_T start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_H , roman_Δ italic_λ , roman_Δ italic_μ ) , caligraphic_T start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_H , roman_Δ italic_λ , roman_Δ italic_μ ) ]

with the unitary linear operator 𝒯Q(H,Δλ,Δμ)=(HQ,Δλ,QTΔμQ)subscript𝒯𝑄𝐻Δ𝜆Δ𝜇𝐻𝑄Δ𝜆superscript𝑄𝑇Δ𝜇𝑄\mathcal{T}_{Q}(H,\Delta\lambda,\Delta\mu)=(HQ,\Delta\lambda,Q^{T}\Delta\mu Q)caligraphic_T start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_H , roman_Δ italic_λ , roman_Δ italic_μ ) = ( italic_H italic_Q , roman_Δ italic_λ , italic_Q start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Δ italic_μ italic_Q ) on n×r×m×𝕊skewrsuperscript𝑛𝑟superscript𝑚subscriptsuperscript𝕊𝑟𝑠𝑘𝑒𝑤\mathbb{R}^{n\times r}\times\mathbb{R}^{m}\times\mathbb{S}^{r}_{skew}blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT × blackboard_S start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s italic_k italic_e italic_w end_POSTSUBSCRIPT. It follows that 𝒥Yt,t+Δt(Yt,λt,0)subscript𝒥subscript𝑌𝑡𝑡Δ𝑡subscript𝑌𝑡subscript𝜆𝑡0\mathcal{J}_{Y_{t},t+\Delta t}(Y_{t},\lambda_{t},0)caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ) and 𝒥YtQ,t+Δt(YtQ,λt,0)subscript𝒥subscript𝑌𝑡𝑄𝑡Δ𝑡subscript𝑌𝑡𝑄subscript𝜆𝑡0\mathcal{J}_{Y_{t}Q,t+\Delta t}(Y_{t}Q,\lambda_{t},0)caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Q , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Q , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ) have the same eigenvalues.

However, our proof of Theorem 4.2 is by contradiction and hence does not provide an obvious lower bound on the radius of invertibility of 𝒥Yt,t(Yt,λt,0)subscript𝒥subscript𝑌𝑡𝑡subscript𝑌𝑡subscript𝜆𝑡0\mathcal{J}_{Y_{t},t}(Y_{t},\lambda_{t},0)caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ). Here we do not intend to investigate this in more depth. In the error analysis conducted later we will essentially assume to have such a bound available (cf. Lemma 4.5).

4.2 A path-following predictor-corrector algorithm

We now thoroughly describe the path-following predictor-corrector algorithm that we propose for tracking the trajectory of solutions to (SDPt𝑡titalic_t). It includes an optional adaptive step size tuning step which is based on measuring the residual of the optimality conditions, defined as

rest(Y,λ):=2[Ct𝒜t*(λ)]Y𝒜t(YYT)bt.assignsubscriptres𝑡𝑌𝜆subscriptnorm2delimited-[]subscript𝐶𝑡subscriptsuperscript𝒜𝑡𝜆𝑌subscript𝒜𝑡𝑌superscript𝑌𝑇subscript𝑏𝑡\operatorname{res}_{t}(Y,\lambda):=\left\|\begin{array}[]{c}2[C_{t}-\mathcal{A% }^{*}_{t}(\lambda)]Y\\ \mathcal{A}_{t}(YY^{T})-b_{t}\end{array}\right\|_{\infty}.roman_res start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y , italic_λ ) := ∥ start_ARRAY start_ROW start_CELL 2 [ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_λ ) ] italic_Y end_CELL end_ROW start_ROW start_CELL caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) - italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT . (RES)

The residual expresses the maximal component-wise violation of the optimality KKT conditions for the problem (BMt𝑡titalic_t) and is therefore a suitable error measure. Indeed (see, e.g., [57, Theorems 3.1 and 3.2]), if the second-order sufficiency condition for optimality holds at (Yt,λt)subscript𝑌𝑡subscript𝜆𝑡(Y_{t},\lambda_{t})( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), then there are constants η,C1,C2>0𝜂subscript𝐶1subscript𝐶20\eta,C_{1},C_{2}>0italic_η , italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 such that for all (Y,λ)𝑌𝜆(Y,\lambda)( italic_Y , italic_λ ) with (Y,λ)(Yt,λt)ηnorm𝑌𝜆subscript𝑌𝑡subscript𝜆𝑡𝜂\|(Y,\lambda)-(Y_{t},\lambda_{t})\|\leq\eta∥ ( italic_Y , italic_λ ) - ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∥ ≤ italic_η one has

C1(Y,λ)(Yt,λt)rest(Y,λ)C2(Y,λ)(Yt,λt).subscript𝐶1norm𝑌𝜆subscript𝑌𝑡subscript𝜆𝑡subscriptres𝑡𝑌𝜆subscript𝐶2norm𝑌𝜆subscript𝑌𝑡subscript𝜆𝑡C_{1}\|(Y,\lambda)-(Y_{t},\lambda_{t})\|\leq\operatorname{res}_{t}(Y,\lambda)% \leq C_{2}\|(Y,\lambda)-(Y_{t},\lambda_{t})\|.italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ ( italic_Y , italic_λ ) - ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∥ ≤ roman_res start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_Y , italic_λ ) ≤ italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ ( italic_Y , italic_λ ) - ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∥ .

Here and in the following, we we use the norm (Y,λ)2=YF2+λ2superscriptnorm𝑌𝜆2superscriptsubscriptnorm𝑌𝐹2superscriptnorm𝜆2\|(Y,\lambda)\|^{2}=\|Y\|_{F}^{2}+\|\lambda\|^{2}∥ ( italic_Y , italic_λ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ italic_Y ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_λ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

The overall procedure is displayed as Algorithm 1 below. Given a TV-SDP of the form (SDPt𝑡titalic_t), parameterized over a time interval [0,T]0𝑇[0,T][ 0 , italic_T ], the inputs are an approximate initial primal-dual solution pair (X^0,Z(λ^0))subscript^𝑋0𝑍subscript^𝜆0(\hat{X}_{0},Z(\hat{\lambda}_{0}))( over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z ( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) to (SDP00{}_{0}start_FLOATSUBSCRIPT 0 end_FLOATSUBSCRIPT)–(D-SDP00{}_{0}start_FLOATSUBSCRIPT 0 end_FLOATSUBSCRIPT) and an initial step size Δt0Δsubscript𝑡0\Delta t_{0}roman_Δ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. At each iteration the current iterate is used to construct the linear system (4.4), which is then solved, returning the updates ΔYΔ𝑌\Delta Yroman_Δ italic_Y and ΔλΔ𝜆\Delta\lambdaroman_Δ italic_λ. The presented version of the algorithm also includes a procedure for tuning the step size that can be activated through the Boolean variable step size_TUNING and is supposed to ensure that the residual threshold is satisfied at every time step. Specifically, if for a time step the threshold is violated, the step size is reduced by a factor γ1(0,1)subscript𝛾101\gamma_{1}\in(0,1)italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ ( 0 , 1 ) and a more accurate solution is obtained by solving the linearized KKT system (4.4) for the reduced time step. On the other hand, to avoid unnecessary small steps, the step size is increased after every successful step by a factor γ2>1subscript𝛾21\gamma_{2}>1italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 1 (but is never made larger than Δt0Δsubscript𝑡0\Delta t_{0}roman_Δ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT). If the step size tuning is deactivated, the algorithm just runs with the constant step size Δt0Δsubscript𝑡0\Delta t_{0}roman_Δ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT instead. Note that Algorithm 1 tracks both the primal solution Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the dual solution Zt=Ct𝒜t*(λt)subscript𝑍𝑡subscript𝐶𝑡subscriptsuperscript𝒜𝑡subscript𝜆𝑡Z_{t}=C_{t}-\mathcal{A}^{*}_{t}(\lambda_{t})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ).

Algorithm 1 Path-following predictor-corrector for (SDPt𝑡titalic_t) with t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ]

Input: an initial approximate primal-dual solution (X^0,Z(λ^0))subscript^𝑋0𝑍subscript^𝜆0(\hat{X}_{0},Z(\hat{\lambda}_{0}))( over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Z ( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) to (SDP00{}_{0}start_FLOATSUBSCRIPT 0 end_FLOATSUBSCRIPT)–(D-SDP00{}_{0}start_FLOATSUBSCRIPT 0 end_FLOATSUBSCRIPT)
initial step size Δt0Δsubscript𝑡0\Delta t_{0}roman_Δ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
boolean variable step size_TUNING
step size tuning parameters γ1(0,1)subscript𝛾101\gamma_{1}\in(0,1)italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ ( 0 , 1 ), γ2>1subscript𝛾21\gamma_{2}>1italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 1
residual tolerance ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0
Output: solutions {X^k}k=0,,Ksubscriptsubscript^𝑋𝑘𝑘0𝐾\{\hat{X}_{k}\}_{k=0,\dots,K}{ over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 0 , … , italic_K end_POSTSUBSCRIPT to (SDPt𝑡titalic_t) for t{0,,tk,,T}𝑡0subscript𝑡𝑘𝑇t\in\{0,\dots,t_{k},\dots,T\}italic_t ∈ { 0 , … , italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , … , italic_T }

1:  k0absent𝑘0k\xleftarrow{}0italic_k start_ARROW start_OVERACCENT end_OVERACCENT ← end_ARROW 0
2:  t00absentsubscript𝑡00t_{0}\xleftarrow{}0italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_ARROW start_OVERACCENT end_OVERACCENT ← end_ARROW 0
3:  ΔtΔt0absentΔ𝑡Δsubscript𝑡0\Delta t\xleftarrow{}\Delta t_{0}roman_Δ italic_t start_ARROW start_OVERACCENT end_OVERACCENT ← end_ARROW roman_Δ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
4:  S={X^0}𝑆subscript^𝑋0S=\{\hat{X}_{0}\}italic_S = { over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT }, r=rank(X^0)𝑟ranksubscript^𝑋0r=\operatorname{rank}(\hat{X}_{0})italic_r = roman_rank ( over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )
5:  find Y^0n×rsubscript^𝑌0superscript𝑛𝑟\hat{Y}_{0}\in\mathbb{R}^{n\times r}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT such that Y^0Y^0T=X^0subscript^𝑌0superscriptsubscript^𝑌0𝑇subscript^𝑋0\hat{Y}_{0}\hat{Y}_{0}^{T}=\hat{X}_{0}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
6:  while tk<Tsubscript𝑡𝑘𝑇t_{k}<Titalic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT < italic_T do
7:     solve linear system (4.4) with data Δt,tk,Y^k,λ^kΔ𝑡subscript𝑡𝑘subscript^𝑌𝑘subscript^𝜆𝑘\Delta t,t_{k},\hat{Y}_{k},\hat{\lambda}_{k}roman_Δ italic_t , italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and obtain ΔY,ΔλΔ𝑌Δ𝜆\Delta Y,\Delta\lambdaroman_Δ italic_Y , roman_Δ italic_λ
8:     if step size_TUNING and resY^k,tk+Δt(Y^k+ΔY,λ^k+Δλ)>ϵsubscriptressubscript^𝑌𝑘subscript𝑡𝑘Δ𝑡subscript^𝑌𝑘Δ𝑌subscript^𝜆𝑘Δ𝜆italic-ϵ\operatorname{res}_{\hat{Y}_{k},t_{k}+\Delta t}(\hat{Y}_{k}+\Delta Y,\hat{% \lambda}_{k}+\Delta\lambda)>\epsilonroman_res start_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + roman_Δ italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + roman_Δ italic_Y , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + roman_Δ italic_λ ) > italic_ϵ then
9:        Δtγ1ΔtabsentΔ𝑡subscript𝛾1Δ𝑡\Delta t\xleftarrow{}\gamma_{1}\Delta troman_Δ italic_t start_ARROW start_OVERACCENT end_OVERACCENT ← end_ARROW italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Δ italic_t
10:        go back to step 6
11:     (tk+1,Y^k+1,λ^k+1)(tk+Δt,Y^k+ΔY,λ^k+Δλ)absentsubscript𝑡𝑘1subscript^𝑌𝑘1subscript^𝜆𝑘1subscript𝑡𝑘Δ𝑡subscript^𝑌𝑘Δ𝑌subscript^𝜆𝑘Δ𝜆(t_{k+1},\hat{Y}_{k+1},\hat{\lambda}_{k+1})\xleftarrow{}(t_{k}+\Delta t,\hat{Y% }_{k}+\Delta Y,\hat{\lambda}_{k}+\Delta\lambda)( italic_t start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) start_ARROW start_OVERACCENT end_OVERACCENT ← end_ARROW ( italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + roman_Δ italic_t , over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + roman_Δ italic_Y , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + roman_Δ italic_λ )
12:     append X^k+1=Y^k+1Y^k+1Tsubscript^𝑋𝑘1subscript^𝑌𝑘1superscriptsubscript^𝑌𝑘1𝑇\hat{X}_{k+1}={}\hat{Y}_{k+1}\hat{Y}_{k+1}^{T}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT to S𝑆Sitalic_S
13:     if step size_TUNING then
14:        Δtmin(Ttk+1,γ2Δt,Δt0)absentΔ𝑡𝑇subscript𝑡𝑘1subscript𝛾2Δ𝑡Δsubscript𝑡0\Delta t\xleftarrow{}\min(T-t_{k+1},\gamma_{2}\Delta t,\Delta t_{0})roman_Δ italic_t start_ARROW start_OVERACCENT end_OVERACCENT ← end_ARROW roman_min ( italic_T - italic_t start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Δ italic_t , roman_Δ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )
15:     else
16:        Δtmin(Ttk+1,Δt)absentΔ𝑡𝑇subscript𝑡𝑘1Δ𝑡\Delta t\xleftarrow{}\min(T-t_{k+1},\Delta t)roman_Δ italic_t start_ARROW start_OVERACCENT end_OVERACCENT ← end_ARROW roman_min ( italic_T - italic_t start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , roman_Δ italic_t )
17:     kk+1absent𝑘𝑘1k\xleftarrow{}k+1italic_k start_ARROW start_OVERACCENT end_OVERACCENT ← end_ARROW italic_k + 1
18:  return  S𝑆Sitalic_S

4.3 Error analysis

We investigate the algorithm without step size tuning. The main goal of the following error analysis to show that the computed (X^k,λ^k)subscript^𝑋𝑘subscript^𝜆𝑘(\hat{X}_{k},\hat{\lambda}_{k})( over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), where X^k=Y^kY^kTsubscript^𝑋𝑘subscript^𝑌𝑘superscriptsubscript^𝑌𝑘𝑇\hat{X}_{k}=\hat{Y}_{k}\hat{Y}_{k}^{T}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, remain close to the exact solutions (Xtk,λtk)subscript𝑋subscript𝑡𝑘subscript𝜆subscript𝑡𝑘(X_{t_{k}},\lambda_{t_{k}})( italic_X start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ), if properly initialized. The logic of the proof is similar to standard path following methods based on Newton’s method, e.g. [22]. The specific form of our problem requires some additional considerations that allow for more precise quantitative bounds depending on the problem constants.

Throughout this section, (Xt=YtYtT,Zt)subscript𝑋𝑡subscript𝑌𝑡superscriptsubscript𝑌𝑡𝑇subscript𝑍𝑡(X_{t}=Y_{t}Y_{t}^{T},Z_{t})( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is an optimal primal-dual pair of solutions to (SDPt𝑡titalic_t)–(D-SDPt𝑡titalic_t) satisfying the five assumptions (A1)(A5), so that it is strictly complementary (see Definition 2.2) and such that Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is primal nondegenerate. Notice that the choice of factor Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be arbitrary, since it does not affect any of the subsequent statements. In Lemma 4.1 and its proof, we have seen that for every Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT the unique Lagrange multiplier λtsubscript𝜆𝑡\lambda_{t}italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT satisfies Zt=Ct𝒜t*(λt)subscript𝑍𝑡subscript𝐶𝑡subscriptsuperscript𝒜𝑡subscript𝜆𝑡Z_{t}=C_{t}-\mathcal{A}^{*}_{t}(\lambda_{t})italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), that is,

λt=(At*)(ZtCt)subscript𝜆𝑡superscriptsubscriptsuperscript𝐴𝑡subscript𝑍𝑡subscript𝐶𝑡\lambda_{t}=(A^{*}_{t})^{\dagger}(Z_{t}-C_{t})italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )

with (At*)superscriptsubscriptsuperscript𝐴𝑡(A^{*}_{t})^{\dagger}( italic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT being the pseudo-inverse of At*subscriptsuperscript𝐴𝑡A^{*}_{t}italic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. By assumption (A5), Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and 𝒜t*subscriptsuperscript𝒜𝑡\mathcal{A}^{*}_{t}caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT depend smoothly on t𝑡titalic_t and so does (𝒜t*)superscriptsuperscriptsubscript𝒜𝑡(\mathcal{A}_{t}^{*})^{\dagger}( caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT, since 𝒜tsubscript𝒜𝑡\mathcal{A}_{t}caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is surjective for all t𝑡titalic_t by assumption (A2). Also, by Theorem 2.4, tZtmaps-to𝑡subscript𝑍𝑡t\mapsto Z_{t}italic_t ↦ italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is smooth. Therefore the curve tλtmaps-to𝑡subscript𝜆𝑡t\mapsto\lambda_{t}italic_t ↦ italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is smooth. Since the algorithm operates in the (Y,λ)𝑌𝜆(Y,\lambda)( italic_Y , italic_λ ) space, our implicit goal is to show that the iterates stay close to the set

𝒞:={(Yt,λt)(YtYtT,Z(λt)) is an optimal primal-dual pair to (SDPt)–(D-SDPt),t[0,T]}assign𝒞conditional-setsubscript𝑌𝑡subscript𝜆𝑡(YtYtT,Z(λt)) is an optimal primal-dual pair to (SDPt)–(D-SDPt)𝑡0𝑇\mathcal{C}:=\{(Y_{t},\lambda_{t})\mid\text{$(Y_{t}Y_{t}^{T},Z(\lambda_{t}))$ % is an optimal primal-dual pair to\leavevmode\nobreak\ \eqref{eq: SDP}--\eqref{% eq: DSDP}},\ t\in[0,T]\}caligraphic_C := { ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∣ ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , italic_Z ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) is an optimal primal-dual pair to ( )–( ) , italic_t ∈ [ 0 , italic_T ] }

containing the optimal primal-dual trajectories in the Burer–Monteiro factorization.

Lemma 4.4.

The set 𝒞𝒞\mathcal{C}caligraphic_C is compact.

Proof.

As the curve tλtmaps-to𝑡subscript𝜆𝑡t\mapsto\lambda_{t}italic_t ↦ italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is continuous, it suffices to prove that the set 𝒞Y={Ytt[0,T]}subscript𝒞𝑌conditional-setsubscript𝑌𝑡𝑡0𝑇\mathcal{C}_{Y}=\{Y_{t}\mid t\in[0,T]\}caligraphic_C start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT = { italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∣ italic_t ∈ [ 0 , italic_T ] } is compact. Since YtF=trace(Xt)subscriptnormsubscript𝑌𝑡𝐹tracesubscript𝑋𝑡\|Y_{t}\|_{F}=\sqrt{\operatorname{trace}(X_{t})}∥ italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = square-root start_ARG roman_trace ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG and tXtmaps-to𝑡subscript𝑋𝑡t\mapsto X_{t}italic_t ↦ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is smooth, it is bounded. To see that the set is closed, let (Yn)𝒞Ysubscript𝑌𝑛subscript𝒞𝑌(Y_{n})\subset\mathcal{C}_{Y}( italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⊂ caligraphic_C start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT be a convergent sequence with limit Y𝑌Yitalic_Y such that YnYnT=Xtnsubscript𝑌𝑛superscriptsubscript𝑌𝑛𝑇subscript𝑋subscript𝑡𝑛Y_{n}Y_{n}^{T}=X_{t_{n}}italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = italic_X start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT for some tn[0,T]subscript𝑡𝑛0𝑇t_{n}\in[0,T]italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ [ 0 , italic_T ]. By passing to a subsequence, we can assume tnt[0,T]subscript𝑡𝑛𝑡0𝑇t_{n}\to t\in[0,T]italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → italic_t ∈ [ 0 , italic_T ]. Then obviously Xt=YYTsubscript𝑋𝑡𝑌superscript𝑌𝑇X_{t}=YY^{T}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_Y italic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, which shows that Y𝑌Yitalic_Y is in the set. ∎

We consider the norm on n×r×m×𝕊skewrsuperscript𝑛𝑟superscript𝑚subscriptsuperscript𝕊𝑟𝑠𝑘𝑒𝑤\mathbb{R}^{n\times r}\times\mathbb{R}^{m}\times\mathbb{S}^{r}_{skew}blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT × blackboard_S start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s italic_k italic_e italic_w end_POSTSUBSCRIPT defined by (Y,λ,μ)2=YF2+λ2+μF2superscriptnorm𝑌𝜆𝜇2superscriptsubscriptnorm𝑌𝐹2superscriptnorm𝜆2superscriptsubscriptnorm𝜇𝐹2\|(Y,\lambda,\mu)\|^{2}=\|Y\|_{F}^{2}+\|\lambda\|^{2}+\|\mu\|_{F}^{2}∥ ( italic_Y , italic_λ , italic_μ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ italic_Y ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_λ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_μ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The induced operator norm is denoted op\|\cdot\|_{op}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT.

Lemma 4.5.

There exists a constant m>0𝑚0m>0italic_m > 0 such that

𝒥Yt,t(Yt,λt,0)1op1msubscriptnormsubscript𝒥subscript𝑌𝑡𝑡superscriptsubscript𝑌𝑡subscript𝜆𝑡01𝑜𝑝1𝑚\|\mathcal{J}_{Y_{t},t}(Y_{t},\lambda_{t},0)^{-1}\|_{op}\leq\frac{1}{m}∥ caligraphic_J start_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_m end_ARG (4.6)

for all (Yt,λt)𝒞subscript𝑌𝑡subscript𝜆𝑡𝒞(Y_{t},\lambda_{t})\in\mathcal{C}( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∈ caligraphic_C.

Proof.

On its open domain of definition, the map (Y,λ)𝒥(Y,λ,0)1opmaps-to𝑌𝜆subscriptnorm𝒥superscript𝑌𝜆01𝑜𝑝(Y,\lambda)\mapsto\|\mathcal{J}(Y,\lambda,0)^{-1}\|_{op}( italic_Y , italic_λ ) ↦ ∥ caligraphic_J ( italic_Y , italic_λ , 0 ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT is continuous. By Theorem 4.2, the compact set 𝒞𝒞\mathcal{C}caligraphic_C is contained in that domain. Therefore, 𝒥(Y,λ,0)1opsubscriptnorm𝒥superscript𝑌𝜆01𝑜𝑝\|\mathcal{J}(Y,\lambda,0)^{-1}\|_{op}∥ caligraphic_J ( italic_Y , italic_λ , 0 ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT achieves its maximum on 𝒞𝒞\mathcal{C}caligraphic_C. ∎

Lemma 4.6.

For any t[0,T]𝑡0𝑇t\in[0,T]italic_t ∈ [ 0 , italic_T ] and Y^n×rnormal-^𝑌superscript𝑛𝑟\hat{Y}\in\mathbb{R}^{n\times r}over^ start_ARG italic_Y end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT, the map** (Y,λ,μ)𝒥Y^,t(Y,λ,μ)maps-to𝑌𝜆𝜇subscript𝒥normal-^𝑌𝑡𝑌𝜆𝜇(Y,\lambda,\mu)\mapsto\mathcal{J}_{\hat{Y},t}(Y,\lambda,\mu)( italic_Y , italic_λ , italic_μ ) ↦ caligraphic_J start_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG , italic_t end_POSTSUBSCRIPT ( italic_Y , italic_λ , italic_μ ) is Lipschitz continuous in the operator norm on n×r×m×𝕊skewrsuperscript𝑛𝑟superscript𝑚subscriptsuperscript𝕊𝑟𝑠𝑘𝑒𝑤\mathbb{R}^{n\times r}\times\mathbb{R}^{m}\times\mathbb{S}^{r}_{skew}blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT × blackboard_S start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s italic_k italic_e italic_w end_POSTSUBSCRIPT. Specifically,

𝒥Y^,t(Y1,λ1,μ1)𝒥Y^,t(Y2,λ2,μ2)op123𝒜t(Y1,λ1,μ1)(Y2,λ2,μ2)subscriptnormsubscript𝒥^𝑌𝑡subscript𝑌1subscript𝜆1subscript𝜇1subscript𝒥^𝑌𝑡subscript𝑌2subscript𝜆2subscript𝜇2𝑜𝑝123normsubscript𝒜𝑡normsubscript𝑌1subscript𝜆1subscript𝜇1subscript𝑌2subscript𝜆2subscript𝜇2\|\mathcal{J}_{\hat{Y},t}(Y_{1},\lambda_{1},\mu_{1})-\mathcal{J}_{\hat{Y},t}(Y% _{2},\lambda_{2},\mu_{2})\|_{op}\leq 12\sqrt{3}\|\mathcal{A}_{t}\|\|(Y_{1},% \lambda_{1},\mu_{1})-(Y_{2},\lambda_{2},\mu_{2})\|∥ caligraphic_J start_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG , italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - caligraphic_J start_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG , italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ≤ 12 square-root start_ARG 3 end_ARG ∥ caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ∥ ( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - ( italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥

for all (Y1,λ1,μ1)subscript𝑌1subscript𝜆1subscript𝜇1(Y_{1},\lambda_{1},\mu_{1})( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and (Y2,λ2,μ2)subscript𝑌2subscript𝜆2subscript𝜇2(Y_{2},\lambda_{2},\mu_{2})( italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), where 𝒜tnormsubscript𝒜𝑡\|\mathcal{A}_{t}\|∥ caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ is the operator norm of 𝒜tsubscript𝒜𝑡\mathcal{A}_{t}caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Proof.

It follows from (4.1) that as a bilinear form one has

(𝒥Y^,t(Y1,λ1,μ1)𝒥Y^,t(Y2,λ2,μ2))[(H,Δλ,Δμ),(H,Δλ,Δμ)]subscript𝒥^𝑌𝑡subscript𝑌1subscript𝜆1subscript𝜇1subscript𝒥^𝑌𝑡subscript𝑌2subscript𝜆2subscript𝜇2𝐻Δ𝜆Δ𝜇𝐻Δ𝜆Δ𝜇\displaystyle(\mathcal{J}_{\hat{Y},t}(Y_{1},\lambda_{1},\mu_{1})-\mathcal{J}_{% \hat{Y},t}(Y_{2},\lambda_{2},\mu_{2}))[(H,\Delta\lambda,\Delta\mu),(H,\Delta% \lambda,\Delta\mu)]( caligraphic_J start_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG , italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - caligraphic_J start_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG , italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) [ ( italic_H , roman_Δ italic_λ , roman_Δ italic_μ ) , ( italic_H , roman_Δ italic_λ , roman_Δ italic_μ ) ]
=\displaystyle{}={}= trace(HT𝒜t*(λ2λ1)H)2(Δλ)T𝒜t((Y1Y2)HT+H(Y1Y2)T)tracesuperscript𝐻𝑇subscriptsuperscript𝒜𝑡subscript𝜆2subscript𝜆1𝐻2superscriptΔ𝜆𝑇subscript𝒜𝑡subscript𝑌1subscript𝑌2superscript𝐻𝑇𝐻superscriptsubscript𝑌1subscript𝑌2𝑇\displaystyle\operatorname{trace}(H^{T}\mathcal{A}^{*}_{t}(\lambda_{2}-\lambda% _{1})H)-2(\Delta\lambda)^{T}\mathcal{A}_{t}((Y_{1}-Y_{2})H^{T}+H(Y_{1}-Y_{2})^% {T})roman_trace ( italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_H ) - 2 ( roman_Δ italic_λ ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_H ( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT )
\displaystyle{}\leq{} 𝒜tλ1λ2HF2+4𝒜tY1Y2FHFΔλnormsubscript𝒜𝑡normsubscript𝜆1subscript𝜆2superscriptsubscriptnorm𝐻𝐹24normsubscript𝒜𝑡subscriptnormsubscript𝑌1subscript𝑌2𝐹subscriptnorm𝐻𝐹normΔ𝜆\displaystyle\|\mathcal{A}_{t}\|\|\lambda_{1}-\lambda_{2}\|\|H\|_{F}^{2}+4\|% \mathcal{A}_{t}\|\|Y_{1}-Y_{2}\|_{F}\|H\|_{F}\|\Delta\lambda\|∥ caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ∥ italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ ∥ italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 ∥ caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ∥ italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ∥ italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ∥ roman_Δ italic_λ ∥
\displaystyle{}\leq{} (𝒜tλ1λ2+4𝒜tY1Y2F)(HF+Δλ+ΔμF)2normsubscript𝒜𝑡normsubscript𝜆1subscript𝜆24normsubscript𝒜𝑡subscriptnormsubscript𝑌1subscript𝑌2𝐹superscriptsubscriptnorm𝐻𝐹normΔ𝜆subscriptnormΔ𝜇𝐹2\displaystyle(\|\mathcal{A}_{t}\|\|\lambda_{1}-\lambda_{2}\|+4\|\mathcal{A}_{t% }\|\|Y_{1}-Y_{2}\|_{F})(\|H\|_{F}+\|\Delta\lambda\|+\|\Delta\mu\|_{F})^{2}( ∥ caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ∥ italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ + 4 ∥ caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ∥ italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) ( ∥ italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT + ∥ roman_Δ italic_λ ∥ + ∥ roman_Δ italic_μ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
\displaystyle{}\leq{} 4𝒜t(Y1Y2F+λ1λ2+μ1μ2)(HF+Δλ+ΔμF)24normsubscript𝒜𝑡subscriptnormsubscript𝑌1subscript𝑌2𝐹normsubscript𝜆1subscript𝜆2normsubscript𝜇1subscript𝜇2superscriptsubscriptnorm𝐻𝐹normΔ𝜆subscriptnormΔ𝜇𝐹2\displaystyle 4\|\mathcal{A}_{t}\|(\|Y_{1}-Y_{2}\|_{F}+\|\lambda_{1}-\lambda_{% 2}\|+\|\mu_{1}-\mu_{2}\|)(\|H\|_{F}+\|\Delta\lambda\|+\|\Delta\mu\|_{F})^{2}4 ∥ caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ( ∥ italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT + ∥ italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ + ∥ italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ ) ( ∥ italic_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT + ∥ roman_Δ italic_λ ∥ + ∥ roman_Δ italic_μ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
\displaystyle{}\leq{} 123𝒜t(Y1,λ1,μ1)(Y2,λ2,μ2)(H,Δλ,Δμ)2.123normsubscript𝒜𝑡normsubscript𝑌1subscript𝜆1subscript𝜇1subscript𝑌2subscript𝜆2subscript𝜇2superscriptnorm𝐻Δ𝜆Δ𝜇2\displaystyle 12\sqrt{3}\|\mathcal{A}_{t}\|\|(Y_{1},\lambda_{1},\mu_{1})-(Y_{2% },\lambda_{2},\mu_{2})\|\|(H,\Delta\lambda,\Delta\mu)\|^{2}.12 square-root start_ARG 3 end_ARG ∥ caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ ∥ ( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - ( italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥ ∥ ( italic_H , roman_Δ italic_λ , roman_Δ italic_μ ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

This proves the claim. ∎

Since t𝒜tmaps-to𝑡subscript𝒜𝑡t\mapsto\mathcal{A}_{t}italic_t ↦ caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is assumed to be continuous, the constant M=maxt[0,T]123𝒜t𝑀subscript𝑡0𝑇123normsubscript𝒜𝑡M=\max_{t\in[0,T]}12\sqrt{3}\|\mathcal{A}_{t}\|italic_M = roman_max start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT 12 square-root start_ARG 3 end_ARG ∥ caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ satisfies the uniform Lipschitz condition

𝒥Y^,t(Y1,λ1,μ1)𝒥Y^,t(Y2,λ2,μ2)opM(Y1,λ1,μ1)(Y2,λ2,μ2)subscriptnormsubscript𝒥^𝑌𝑡subscript𝑌1subscript𝜆1subscript𝜇1subscript𝒥^𝑌𝑡subscript𝑌2subscript𝜆2subscript𝜇2𝑜𝑝𝑀normsubscript𝑌1subscript𝜆1subscript𝜇1subscript𝑌2subscript𝜆2subscript𝜇2\|\mathcal{J}_{\hat{Y},t}(Y_{1},\lambda_{1},\mu_{1})-\mathcal{J}_{\hat{Y},t}(Y% _{2},\lambda_{2},\mu_{2})\|_{op}\leq M\|(Y_{1},\lambda_{1},\mu_{1})-(Y_{2},% \lambda_{2},\mu_{2})\|∥ caligraphic_J start_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG , italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - caligraphic_J start_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG , italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT italic_o italic_p end_POSTSUBSCRIPT ≤ italic_M ∥ ( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - ( italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥ (4.7)

for all (Y1,λ1,μ1)subscript𝑌1subscript𝜆1subscript𝜇1(Y_{1},\lambda_{1},\mu_{1})( italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and (Y2,λ2,μ2)subscript𝑌2subscript𝜆2subscript𝜇2(Y_{2},\lambda_{2},\mu_{2})( italic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), independent of the choice of Y^n×r^𝑌superscript𝑛𝑟\hat{Y}\in\mathbb{R}^{n\times r}over^ start_ARG italic_Y end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_r end_POSTSUPERSCRIPT. In what follows, we proceed with using (4.7) and (4.6), without further investigating the sharpest possible bounds.

In addition, let λr(Xt)λ*>0subscript𝜆𝑟subscript𝑋𝑡subscript𝜆0\lambda_{r}(X_{t})\geq\lambda_{*}>0italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≥ italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT > 0 be a uniform lower bound on the smallest positive eigenvalue as in (3.11). Furthermore, we now also assume a uniform upper bound

Yt2=λ1(Xt)Λ*.subscriptnormsubscript𝑌𝑡2subscript𝜆1subscript𝑋𝑡subscriptΛ\|Y_{t}\|_{2}=\sqrt{\lambda_{1}(X_{t})}\leq\sqrt{\Lambda_{*}}.∥ italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = square-root start_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG ≤ square-root start_ARG roman_Λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG .

on the spectral norm of Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Finally, let X˙tFLsubscriptnormsubscript˙𝑋𝑡𝐹𝐿\|\dot{X}_{t}\|_{F}\leq L∥ over˙ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ italic_L as in (3.10) and since the curve tλtmaps-to𝑡subscript𝜆𝑡t\mapsto\lambda_{t}italic_t ↦ italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is smooth, the constant

K:=maxt[0,T]λ˙tassign𝐾subscript𝑡0𝑇normsubscript˙𝜆𝑡K:=\max_{t\in[0,T]}\|\dot{\lambda}_{t}\|italic_K := roman_max start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT ∥ over˙ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ (4.8)

is also well-defined.

With the necessary constants at hand, we are now in the position to state our main result on the error analysis. The following theorem shows that we can bound the distance between the iterates of Algorithm 1 and the set of solutions to (BMt𝑡titalic_t) provided the initial point is close enough to the set of initial solutions and the step size ΔtΔ𝑡\Delta troman_Δ italic_t is small enough. Here we employ again the natural distance measure minQ𝒪rY^YQFsubscript𝑄subscript𝒪𝑟subscriptnorm^𝑌𝑌𝑄𝐹\min_{Q\in\mathcal{O}_{r}}\|\hat{Y}-YQ\|_{F}roman_min start_POSTSUBSCRIPT italic_Q ∈ caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ over^ start_ARG italic_Y end_ARG - italic_Y italic_Q ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT between the orbits Y^𝒪r^𝑌subscript𝒪𝑟\hat{Y}\mathcal{O}_{r}over^ start_ARG italic_Y end_ARG caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and Y𝒪r𝑌subscript𝒪𝑟Y\mathcal{O}_{r}italic_Y caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, cf. Remark 3.4.

Theorem 4.7.

Let δ>0𝛿0\delta>0italic_δ > 0 and Δt>0normal-Δ𝑡0\Delta t>0roman_Δ italic_t > 0 be small enough such that the following three conditions are satisfied:

(2Λ*+δ)δ+LΔt<2λ*r+4+r,2subscriptΛ𝛿𝛿𝐿Δ𝑡2subscript𝜆𝑟4𝑟\displaystyle(2\sqrt{\Lambda_{*}}+\delta)\delta+L\Delta t<\frac{2\lambda_{*}}{% \sqrt{r+4}+\sqrt{r}},( 2 square-root start_ARG roman_Λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG + italic_δ ) italic_δ + italic_L roman_Δ italic_t < divide start_ARG 2 italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_r + 4 end_ARG + square-root start_ARG italic_r end_ARG end_ARG , (4.9)
δ<23mM,𝛿23𝑚𝑀\displaystyle\delta<\frac{2}{3}\frac{m}{M},italic_δ < divide start_ARG 2 end_ARG start_ARG 3 end_ARG divide start_ARG italic_m end_ARG start_ARG italic_M end_ARG , (4.10)
[1λ*((2Λ*+δ)δ+LΔt)2+r(2Λ*+δ)δ+LΔt]2+(δ+KΔt)223mMδ.superscriptdelimited-[]1subscript𝜆superscript2subscriptΛ𝛿𝛿𝐿Δ𝑡2𝑟2subscriptΛ𝛿𝛿𝐿Δ𝑡2superscript𝛿𝐾Δ𝑡223𝑚𝑀𝛿\displaystyle\left[\frac{1}{\lambda_{*}}((2\sqrt{\Lambda_{*}}+\delta)\delta+L% \Delta t)^{2}+\sqrt{r}(2\sqrt{\Lambda_{*}}+\delta)\delta+L\Delta t\right]^{2}+% (\delta+K\Delta t)^{2}\leq\frac{2}{3}\frac{m}{M}\delta.[ divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG ( ( 2 square-root start_ARG roman_Λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG + italic_δ ) italic_δ + italic_L roman_Δ italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + square-root start_ARG italic_r end_ARG ( 2 square-root start_ARG roman_Λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG + italic_δ ) italic_δ + italic_L roman_Δ italic_t ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_δ + italic_K roman_Δ italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 2 end_ARG start_ARG 3 end_ARG divide start_ARG italic_m end_ARG start_ARG italic_M end_ARG italic_δ . (4.11)

Assume for the initial point (Y^0,λ^0)subscriptnormal-^𝑌0subscriptnormal-^𝜆0(\hat{Y}_{0},\hat{\lambda}_{0})( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) that

minQ𝒪r(Y^0,λ^0)(Y0Q,λ0)δ.subscript𝑄subscript𝒪𝑟normsubscript^𝑌0subscript^𝜆0subscript𝑌0𝑄subscript𝜆0𝛿\min_{Q\in\mathcal{O}_{r}}\|(\hat{Y}_{0},\hat{\lambda}_{0})-(Y_{0}Q,\lambda_{0% })\|\leq\delta.roman_min start_POSTSUBSCRIPT italic_Q ∈ caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - ( italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_Q , italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥ ≤ italic_δ . (4.12)

Then Algorithm 1 is well-defined and for all tk+1=tk+Δtsubscript𝑡𝑘1subscript𝑡𝑘normal-Δ𝑡t_{k+1}=t_{k}+\Delta titalic_t start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + roman_Δ italic_t the iterates satisfy

minQ𝒪r(Y^k,λ^k)(YtkQ,λtk)δ.subscript𝑄subscript𝒪𝑟normsubscript^𝑌𝑘subscript^𝜆𝑘subscript𝑌subscript𝑡𝑘𝑄subscript𝜆subscript𝑡𝑘𝛿\min_{Q\in\mathcal{O}_{r}}\|(\hat{Y}_{k},\hat{\lambda}_{k})-(Y_{t_{k}}Q,% \lambda_{t_{k}})\|\leq\delta.roman_min start_POSTSUBSCRIPT italic_Q ∈ caligraphic_O start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - ( italic_Y start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Q , italic_λ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∥ ≤ italic_δ .

It then holds that

X^kXtkF(2Λ*+δ)δsubscriptnormsubscript^𝑋𝑘subscript𝑋subscript𝑡𝑘𝐹2subscriptΛ𝛿𝛿\|\hat{X}_{k}-X_{t_{k}}\|_{F}\leq(2\sqrt{\Lambda_{*}}+\delta)\delta∥ over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ ( 2 square-root start_ARG roman_Λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG + italic_δ ) italic_δ

for all tksubscript𝑡𝑘t_{k}italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

Notice that the left side of (4.11) is O(δ2+Δt2)𝑂superscript𝛿2Δsuperscript𝑡2O(\delta^{2}+\Delta t^{2})italic_O ( italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + roman_Δ italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) for δ,Δt0𝛿Δ𝑡0\delta,\Delta t\to 0italic_δ , roman_Δ italic_t → 0, whereas the right side is only O(δ)𝑂𝛿O(\delta)italic_O ( italic_δ ). Therefore for δ𝛿\deltaitalic_δ and ΔtΔ𝑡\Delta troman_Δ italic_t small enough, (4.11) will be satisfied. Furthermore, a sufficient condition for (4.12) to hold is that

λ^0λ0δ2normsubscript^𝜆0subscript𝜆0𝛿2\|\hat{\lambda}_{0}-\lambda_{0}\|\leq\frac{\delta}{\sqrt{2}}∥ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ≤ divide start_ARG italic_δ end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG

and

X^0X0Frλ*2+22δλ*rλ*22,subscriptnormsubscript^𝑋0subscript𝑋0𝐹𝑟superscriptsubscript𝜆222𝛿subscript𝜆𝑟superscriptsubscript𝜆22\|\hat{X}_{0}-X_{0}\|_{F}\leq\frac{\sqrt{r\lambda_{*}^{2}+2\sqrt{2}\delta% \lambda_{*}}-\sqrt{r\lambda_{*}^{2}}}{2},∥ over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ divide start_ARG square-root start_ARG italic_r italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 square-root start_ARG 2 end_ARG italic_δ italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG - square-root start_ARG italic_r italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG 2 end_ARG ,

which easily follows from (3.9).

Proof.

We will investigate one step of the algorithm and apply an induction hypothesis that at time point t=tk𝑡subscript𝑡𝑘t=t_{k}italic_t = italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT there exists (Yt,λt)𝒞subscript𝑌𝑡subscript𝜆𝑡𝒞(Y_{t},\lambda_{t})\in\mathcal{C}( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∈ caligraphic_C satisfying

(Y^t,λ^t)(Yt,λt)Fδ.subscriptnormsubscript^𝑌𝑡subscript^𝜆𝑡subscript𝑌𝑡subscript𝜆𝑡𝐹𝛿\|(\hat{Y}_{t},\hat{\lambda}_{t})-(Y_{t},\lambda_{t})\|_{F}\leq\delta.∥ ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - ( italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ italic_δ .

We aim to show that for sufficiently small δ>0𝛿0\delta>0italic_δ > 0 and Δt>0Δ𝑡0\Delta t>0roman_Δ italic_t > 0 the next iterate (Y^t+Δt,λ^t+Δt)subscript^𝑌𝑡Δ𝑡subscript^𝜆𝑡Δ𝑡(\hat{Y}_{t+\Delta t},\hat{\lambda}_{t+\Delta t})( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ) in the algorithm is well-defined and satisfies the same estimate

(Y^t+Δt,λ^t+Δt)(Yt+Δt,λt+Δt)Fδsubscriptnormsubscript^𝑌𝑡Δ𝑡subscript^𝜆𝑡Δ𝑡subscript𝑌𝑡Δ𝑡subscript𝜆𝑡Δ𝑡𝐹𝛿\|(\hat{Y}_{t+\Delta t},\hat{\lambda}_{t+\Delta t})-(Y_{t+\Delta t},\lambda_{t% +\Delta t})\|_{F}\leq\delta∥ ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ) - ( italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ italic_δ

with an exact solution (Yt+Δt,λt+Δt)𝒞subscript𝑌𝑡Δ𝑡subscript𝜆𝑡Δ𝑡𝒞(Y_{t+\Delta t},\lambda_{t+\Delta t})\in\mathcal{C}( italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ) ∈ caligraphic_C. The proof of the theorem then follows by induction over the steps in the algorithm.

We first claim that there exists an exact solution Yt+Δtsubscript𝑌𝑡Δ𝑡Y_{t+\Delta t}italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT in the horizontal space of Y^tsubscript^𝑌𝑡\hat{Y}_{t}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, that is, Xt+Δt=Yt+ΔtYt+ΔtTsubscript𝑋𝑡Δ𝑡subscript𝑌𝑡Δ𝑡superscriptsubscript𝑌𝑡Δ𝑡𝑇X_{t+\Delta t}=Y_{t+\Delta t}Y_{t+\Delta t}^{T}italic_X start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT = italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and hY^t(Yt+Δt)=0subscriptsubscript^𝑌𝑡subscript𝑌𝑡Δ𝑡0h_{\hat{Y}_{t}}(Y_{t+\Delta t})=0italic_h start_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ) = 0. Indeed, using (4.9) we have

X^tXtFsubscriptnormsubscript^𝑋𝑡subscript𝑋𝑡𝐹\displaystyle\|\hat{X}_{t}-X_{t}\|_{F}∥ over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT =(Y^tYt)Y^tT+Yt(Y^tYt)TFabsentsubscriptnormsubscript^𝑌𝑡subscript𝑌𝑡subscriptsuperscript^𝑌𝑇𝑡subscript𝑌𝑡superscriptsubscript^𝑌𝑡subscript𝑌𝑡𝑇𝐹\displaystyle=\|(\hat{Y}_{t}-Y_{t})\hat{Y}^{T}_{t}+Y_{t}(\hat{Y}_{t}-Y_{t})^{T% }\|_{F}= ∥ ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) over^ start_ARG italic_Y end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT
(Yt2+Y^t2)δ(2Λ*+δ)δ<2λ*r+4+rLΔt.absentsubscriptnormsubscript𝑌𝑡2subscriptnormsubscript^𝑌𝑡2𝛿2subscriptΛ𝛿𝛿2subscript𝜆𝑟4𝑟𝐿Δ𝑡\displaystyle\leq(\|Y_{t}\|_{2}+\|\hat{Y}_{t}\|_{2})\delta\leq(2\sqrt{\Lambda_% {*}}+\delta)\delta<\frac{2\lambda_{*}}{\sqrt{r+4}+\sqrt{r}}-L\Delta t.≤ ( ∥ italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ∥ over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_δ ≤ ( 2 square-root start_ARG roman_Λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG + italic_δ ) italic_δ < divide start_ARG 2 italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_r + 4 end_ARG + square-root start_ARG italic_r end_ARG end_ARG - italic_L roman_Δ italic_t .

This yields

X^tXt+ΔtFX^tXtF+XtXt+ΔtF<2λ*r+4+r.subscriptnormsubscript^𝑋𝑡subscript𝑋𝑡Δ𝑡𝐹subscriptnormsubscript^𝑋𝑡subscript𝑋𝑡𝐹subscriptnormsubscript𝑋𝑡subscript𝑋𝑡Δ𝑡𝐹2subscript𝜆𝑟4𝑟\|\hat{X}_{t}-X_{t+\Delta t}\|_{F}\leq\|\hat{X}_{t}-X_{t}\|_{F}+\|X_{t}-X_{t+% \Delta t}\|_{F}<\frac{2\lambda_{*}}{\sqrt{r+4}+\sqrt{r}}.∥ over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ ∥ over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT + ∥ italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT < divide start_ARG 2 italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_r + 4 end_ARG + square-root start_ARG italic_r end_ARG end_ARG .

Thus, Proposition 3.3 states the existence of Yt+Δtsubscript𝑌𝑡Δ𝑡Y_{t+\Delta t}italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT as desired. We note for later use that by (3.9) it satisfies

Y^tYt+ΔtFsubscriptnormsubscript^𝑌𝑡subscript𝑌𝑡Δ𝑡𝐹\displaystyle\|\hat{Y}_{t}-Y_{t+\Delta t}\|_{F}∥ over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT 1λ*X^tXt+ΔtF2+rX^tXt+ΔtFabsent1subscript𝜆superscriptsubscriptnormsubscript^𝑋𝑡subscript𝑋𝑡Δ𝑡𝐹2𝑟subscriptnormsubscript^𝑋𝑡subscript𝑋𝑡Δ𝑡𝐹\displaystyle\leq\frac{1}{\lambda_{*}}\|\hat{X}_{t}-X_{t+\Delta t}\|_{F}^{2}+% \sqrt{r}\|\hat{X}_{t}-X_{t+\Delta t}\|_{F}≤ divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG ∥ over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + square-root start_ARG italic_r end_ARG ∥ over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT
1λ*(X^tXtF+LΔt)2+rX^tXtF+LΔtabsent1subscript𝜆superscriptsubscriptnormsubscript^𝑋𝑡subscript𝑋𝑡𝐹𝐿Δ𝑡2𝑟subscriptnormsubscript^𝑋𝑡subscript𝑋𝑡𝐹𝐿Δ𝑡\displaystyle\leq\frac{1}{\lambda_{*}}(\|\hat{X}_{t}-X_{t}\|_{F}+L\Delta t)^{2% }+\sqrt{r}\|\hat{X}_{t}-X_{t}\|_{F}+L\Delta t≤ divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG ( ∥ over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT + italic_L roman_Δ italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + square-root start_ARG italic_r end_ARG ∥ over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT + italic_L roman_Δ italic_t (4.13)
1λ*((2Λ*+δ)δ+LΔt)2+r(2Λ*+δ)δ+LΔt.absent1subscript𝜆superscript2subscriptΛ𝛿𝛿𝐿Δ𝑡2𝑟2subscriptΛ𝛿𝛿𝐿Δ𝑡\displaystyle\leq\frac{1}{\lambda_{*}}((2\sqrt{\Lambda_{*}}+\delta)\delta+L% \Delta t)^{2}+\sqrt{r}(2\sqrt{\Lambda_{*}}+\delta)\delta+L\Delta t.≤ divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG ( ( 2 square-root start_ARG roman_Λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG + italic_δ ) italic_δ + italic_L roman_Δ italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + square-root start_ARG italic_r end_ARG ( 2 square-root start_ARG roman_Λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG + italic_δ ) italic_δ + italic_L roman_Δ italic_t .

The matrix Yt+Δtsubscript𝑌𝑡Δ𝑡Y_{t+\Delta t}italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT is an exact solution of (BMt+Δt𝑡Δ𝑡{}_{t+\Delta t}start_FLOATSUBSCRIPT italic_t + roman_Δ italic_t end_FLOATSUBSCRIPT), and by Theorem 4.2 there is a unique Lagrange multiplier λt+Δtsubscript𝜆𝑡Δ𝑡\lambda_{t+\Delta t}italic_λ start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT such that Y^t,t+Δt(Yt+Δt,λt+Δt,0)=0subscriptsubscript^𝑌𝑡𝑡Δ𝑡subscript𝑌𝑡Δ𝑡subscript𝜆𝑡Δ𝑡00\mathcal{F}_{\hat{Y}_{t},t+\Delta t}(Y_{t+\Delta t},\lambda_{t+\Delta t},0)=0caligraphic_F start_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , 0 ) = 0. By construction, the next iterate (Y^t+Δt,λ^t+Δt,μ^t+Δt)subscript^𝑌𝑡Δ𝑡subscript^𝜆𝑡Δ𝑡subscript^𝜇𝑡Δ𝑡(\hat{Y}_{t+\Delta t},\hat{\lambda}_{t+\Delta t},\hat{\mu}_{t+\Delta t})( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ) in the algorithm is obtained from one step of the Newton method for solving this equation with starting point (Y^t,λ^t,0)subscript^𝑌𝑡subscript^𝜆𝑡0(\hat{Y}_{t},\hat{\lambda}_{t},0)( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ). In light of (4.6) and (4.7), standard results (e.g. Theorem 1.2.5 in [42]) on the Newton method yield that under the condition

(Y^t,λ^t,0)(Yt+Δt,λt+Δt,0)Fε<23mMsubscriptnormsubscript^𝑌𝑡subscript^𝜆𝑡0subscript𝑌𝑡Δ𝑡subscript𝜆𝑡Δ𝑡0𝐹𝜀23𝑚𝑀\|(\hat{Y}_{t},\hat{\lambda}_{t},0)-(Y_{t+\Delta t},\lambda_{t+\Delta t},0)\|_% {F}\leq\varepsilon<\frac{2}{3}\frac{m}{M}∥ ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ) - ( italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , 0 ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ italic_ε < divide start_ARG 2 end_ARG start_ARG 3 end_ARG divide start_ARG italic_m end_ARG start_ARG italic_M end_ARG

one step of the method is well-defined, i.e., 𝒥Y^t,t+Δt(Y^t,λ^t,0)subscript𝒥subscript^𝑌𝑡𝑡Δ𝑡subscript^𝑌𝑡subscript^𝜆𝑡0\mathcal{J}_{\hat{Y}_{t},t+\Delta t}(\hat{Y}_{t},\hat{\lambda}_{t},0)caligraphic_J start_POSTSUBSCRIPT over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ) is invertible, and satisfies

(Y^t+Δt,λ^t+Δt,μ^t+Δt)(Yt+Δt,λt+Δt,0)F32Mm(Y^t,λ^t,0)(Yt+Δt,λt+Δt,0)F2.subscriptnormsubscript^𝑌𝑡Δ𝑡subscript^𝜆𝑡Δ𝑡subscript^𝜇𝑡Δ𝑡subscript𝑌𝑡Δ𝑡subscript𝜆𝑡Δ𝑡0𝐹32𝑀𝑚superscriptsubscriptnormsubscript^𝑌𝑡subscript^𝜆𝑡0subscript𝑌𝑡Δ𝑡subscript𝜆𝑡Δ𝑡0𝐹2\|(\hat{Y}_{t+\Delta t},\hat{\lambda}_{t+\Delta t},\hat{\mu}_{t+\Delta t})-(Y_% {t+\Delta t},\lambda_{t+\Delta t},0)\|_{F}\leq\frac{3}{2}\frac{M}{m}\|(\hat{Y}% _{t},\hat{\lambda}_{t},0)-(Y_{t+\Delta t},\lambda_{t+\Delta t},0)\|_{F}^{2}.∥ ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ) - ( italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , 0 ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ divide start_ARG 3 end_ARG start_ARG 2 end_ARG divide start_ARG italic_M end_ARG start_ARG italic_m end_ARG ∥ ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 0 ) - ( italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , 0 ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

In particular, using ε=(23mMδ)1/2𝜀superscript23𝑚𝑀𝛿12\varepsilon=\left(\frac{2}{3}\frac{m}{M}\delta\right)^{1/2}italic_ε = ( divide start_ARG 2 end_ARG start_ARG 3 end_ARG divide start_ARG italic_m end_ARG start_ARG italic_M end_ARG italic_δ ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT would give the desired result

(Y^t+Δt,λ^t+Δt)(Yt+Δt,λt+Δt)F32Mmε2=δ.subscriptnormsubscript^𝑌𝑡Δ𝑡subscript^𝜆𝑡Δ𝑡subscript𝑌𝑡Δ𝑡subscript𝜆𝑡Δ𝑡𝐹32𝑀𝑚superscript𝜀2𝛿\|(\hat{Y}_{t+\Delta t},\hat{\lambda}_{t+\Delta t})-(Y_{t+\Delta t},\lambda_{t% +\Delta t})\|_{F}\leq\frac{3}{2}\frac{M}{m}\varepsilon^{2}=\delta.∥ ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ) - ( italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ divide start_ARG 3 end_ARG start_ARG 2 end_ARG divide start_ARG italic_M end_ARG start_ARG italic_m end_ARG italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_δ .

Therefore, we need to ensure that

(Y^t,λ^t)(Yt+Δt,λt+Δt)F(23mMδ)1/2<23mMsubscriptnormsubscript^𝑌𝑡subscript^𝜆𝑡subscript𝑌𝑡Δ𝑡subscript𝜆𝑡Δ𝑡𝐹superscript23𝑚𝑀𝛿1223𝑚𝑀\|(\hat{Y}_{t},\hat{\lambda}_{t})-(Y_{t+\Delta t},\lambda_{t+\Delta t})\|_{F}% \leq\left(\frac{2}{3}\frac{m}{M}\delta\right)^{1/2}<\frac{2}{3}\frac{m}{M}∥ ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - ( italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ ( divide start_ARG 2 end_ARG start_ARG 3 end_ARG divide start_ARG italic_m end_ARG start_ARG italic_M end_ARG italic_δ ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT < divide start_ARG 2 end_ARG start_ARG 3 end_ARG divide start_ARG italic_m end_ARG start_ARG italic_M end_ARG

is satisfied. Here the second inequality is just condition (4.10). We now show that (4.11) is a sufficient condition for the first inequality. Clearly, using (4.8), we have

λ^tλt+Δt2(λ^tλt+KΔt)2(δ+KΔt)2.superscriptnormsubscript^𝜆𝑡subscript𝜆𝑡Δ𝑡2superscriptnormsubscript^𝜆𝑡subscript𝜆𝑡𝐾Δ𝑡2superscript𝛿𝐾Δ𝑡2\|\hat{\lambda}_{t}-\lambda_{t+\Delta t}\|^{2}\leq(\|\hat{\lambda}_{t}-\lambda% _{t}\|+K\Delta t)^{2}\leq(\delta+K\Delta t)^{2}.∥ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ( ∥ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ + italic_K roman_Δ italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ( italic_δ + italic_K roman_Δ italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Together with (4.13) this gives

Y^tYt+ΔtF2+λ^tλt+Δt2[1λ*[(2Λ*+δ)δ+LΔt]2+r(2Λ*+δ)δ+LΔt]2+(δ+KΔt)2.superscriptsubscriptnormsubscript^𝑌𝑡subscript𝑌𝑡Δ𝑡𝐹2superscriptnormsubscript^𝜆𝑡subscript𝜆𝑡Δ𝑡2superscriptdelimited-[]1subscript𝜆superscriptdelimited-[]2subscriptΛ𝛿𝛿𝐿Δ𝑡2𝑟2subscriptΛ𝛿𝛿𝐿Δ𝑡2superscript𝛿𝐾Δ𝑡2\|\hat{Y}_{t}-Y_{t+\Delta t}\|_{F}^{2}+\|\hat{\lambda}_{t}-\lambda_{t+\Delta t% }\|^{2}\\ \leq\left[\frac{1}{\lambda_{*}}[(2\sqrt{\Lambda_{*}}\!+\!\delta)\delta\!+\!L% \Delta t]^{2}\!+\!\sqrt{r}(2\sqrt{\Lambda_{*}}\!+\!\delta)\delta\!+\!L\Delta t% \right]^{2}+(\delta+K\Delta t)^{2}.∥ over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ [ divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG [ ( 2 square-root start_ARG roman_Λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG + italic_δ ) italic_δ + italic_L roman_Δ italic_t ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + square-root start_ARG italic_r end_ARG ( 2 square-root start_ARG roman_Λ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT end_ARG + italic_δ ) italic_δ + italic_L roman_Δ italic_t ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_δ + italic_K roman_Δ italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Now (4.11) ensures the desired estimate for the right-hand side and the proof is completed. ∎

5 Numerical experiments on Time-Varying Max Cut

In this section, we compare the tracking of the trajectory of solutions to TV-SDP via Algorithm 1 with interior-point methods (IPMs) used to track the same trajectory by solving the problem at discrete time points. In our experiments, we used the implementation of the homogeneous and self-dual algorithm [6, 24] from the MOSEK Optimization Suite, version 9.3 [40]. Furthermore, in order to provide a comparison with an alternative warm-start approach, we performed numerical experiments using the Splitting Conic Solver (SCS), version 3.2.2 [46]. This package implements the first-order method presented in [44, 45], which uses an operator splitting method, the alternating directions method of multipliers, to solve the homogeneous self-dual embedding. We show the algorithm proposed in this paper can perform better, in terms of both accuracy and runtime, than repeated runs of IPM for time-invariant SDP and than the warm-started SCS.

Given a weighted graph 𝒢=(V,E)𝒢𝑉𝐸\mathcal{G}=(V,E)caligraphic_G = ( italic_V , italic_E ), the Max-Cut problem is a well-known problem in graph theory. There, we wish to find a binary partition of the vertices in V𝑉Vitalic_V (also known as a cut) of maximal weight. The weight of the cut is defined as the sum of the weights of the edges in E𝐸Eitalic_E connecting the two subsets of the partition. This problem can be formulated as the following quadratically-constrained quadratic problem

maxxnsubscript𝑥superscript𝑛\displaystyle\max_{x\in\mathbb{R}^{n}}roman_max start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT i,j=1nwi,j(1xixj)superscriptsubscript𝑖𝑗1𝑛subscript𝑤𝑖𝑗1subscript𝑥𝑖subscript𝑥𝑗\displaystyle\sum_{i,j=1}^{n}w_{i,j}(1-x_{i}x_{j})∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( 1 - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) (MC)
  s.t. xi2=1for all i{1,,n},formulae-sequencesuperscriptsubscript𝑥𝑖21for all 𝑖1𝑛\displaystyle x_{i}^{2}=1\quad\text{for all }i\in\{1,\dots,n\},italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 for all italic_i ∈ { 1 , … , italic_n } ,

where n=|V|𝑛𝑉n=|V|italic_n = | italic_V | is the number of vertices of the graph, wi,jsubscript𝑤𝑖𝑗w_{i,j}italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT is the weight of the edge connecting vertices i𝑖iitalic_i and j𝑗jitalic_j, and variable xi{1,1}subscript𝑥𝑖11x_{i}\in\{1,-1\}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 1 , - 1 } takes binary values according to the subset to which vertex i𝑖iitalic_i is assigned. This problem can be relaxed to an SDP of the form

minX𝕊nsubscript𝑋superscript𝕊𝑛\displaystyle\min_{X\in\mathbb{S}^{n}}roman_min start_POSTSUBSCRIPT italic_X ∈ blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT W,X𝑊𝑋\displaystyle\langle W,X\rangle⟨ italic_W , italic_X ⟩ (MCR)
  s.t. Xi,i=1for all i{1,,n}formulae-sequencesubscript𝑋𝑖𝑖1for all 𝑖1𝑛\displaystyle X_{i,i}=1\quad\text{for all }i\in\{1,\dots,n\}italic_X start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT = 1 for all italic_i ∈ { 1 , … , italic_n }
X0,succeeds-or-equals𝑋0\displaystyle X\succeq 0,italic_X ⪰ 0 ,

where W𝑊Witalic_W is the weights matrix whose entry (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) is given by wi,jsubscript𝑤𝑖𝑗w_{i,j}italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT, see [25]. Note that the number of constraints is equal to the size of the variable matrix. Randomized approximation algorithms for (MC) exploiting the convex relaxation (MCR) deliver solutions with a performance ratio of 0.870.870.870.87 and are known to be the best poly-time algorithms to approximately solve (MC).

In this paper, we adopt a time-varying version of (MCR) as a benchmark, where the data matrix W𝑊Witalic_W depends on a time parameter t[0,1]Wt𝕊n𝑡01maps-tosubscript𝑊𝑡superscript𝕊𝑛t\in[0,1]\mapsto W_{t}\in\mathbb{S}^{n}italic_t ∈ [ 0 , 1 ] ↦ italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. (We point out that this differs from the recently studied variant [37, 31] with edge insertions and deletions, which could be seen as discontinuous functions of time.)

In our experiment, Wtsubscript𝑊𝑡W_{t}italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is obtained as a random linear perturbation of a sparse weight matrix with density 50%percent5050\%50 %. Specifically,

Wt=W0+tW1,subscript𝑊𝑡subscript𝑊0𝑡subscript𝑊1W_{t}=W_{0}+tW_{1},italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_t italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ,

where the entries of W0subscript𝑊0W_{0}italic_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are randomly generated with a normal distribution having mean and standard deviation μ,σ=10𝜇𝜎10\mu,\sigma=10italic_μ , italic_σ = 10, while the entries in W1subscript𝑊1W_{1}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are chosen with a normal distribution having μ,σ=1𝜇𝜎1\mu,\sigma=1italic_μ , italic_σ = 1. Both matrices have the same sparsity structure. We refer to such a problem as the time-varying max-cut relaxation (TV-MCR), which can be thought of as a convex relaxation for a max-cut problem where the edges weights of a given graph change over time.

All the experiments were conducted on a personal computer with a 1,6 GHz Intel Core i5 dual-core processor with 16GB RAM, using a Python implementation of our path-following algorithm. The main goal was to illustrate the potential computational benefits of our algorithm, so we did not attempt to provide the most efficient implementation. The code222https://github.com/antoniobellon/burer-monteiro-path-following, Eclipse Public License 2.0. as well as the data and experimental results333https://zenodo.org/record/7769225 are available online.

We performed experiments on 110110110110 instances of the TV-MCR problem with n=100𝑛100n=100italic_n = 100 vertices and tracked the trajectory of solutions for t[0,1]𝑡01t\in[0,1]italic_t ∈ [ 0 , 1 ]. Among these samples, we included 10 instances of TV-MCR for which the rank of the solution is not constant, hence violating our assumption (A4). This was done by sampling the rank (estimated with a tolerance on zero eigenvalues of 107superscript10710^{-7}10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT) of the solutions obtained using MOSEK over a 10-steps subdivision of the interval [0,1]01[0,1][ 0 , 1 ] and selecting ten cases in which we observed a change in the rank. Using the same procedure, we checked that for the remaining 100 instances, the rank of the solution is constant along the trajectory.

First, we applied Algorithm 1 without step size adjustment, hence setting step size_TUNING to FALSE, and using step sizes Δt=0.1,0.01,0.001Δ𝑡0.10.010.001\Delta t=0.1,0.01,0.001roman_Δ italic_t = 0.1 , 0.01 , 0.001, so that in each experiment 10, 100, and 1000 iterations are performed for each choice of the step size (see Figures 1 and 2). The factor dimension r𝑟ritalic_r is chosen equal to the rank of an initial solution obtained using MOSEK with relative gap termination tolerances set to 1014superscript101410^{-14}10 start_POSTSUPERSCRIPT - 14 end_POSTSUPERSCRIPT. Its distribution is shown in Table 1.

r𝑟ritalic_r 4 5 6 7
# occurences 2 39 53 6
Table 1: Distribution of the rank over 100100100100 instances of the TV-MCR with n=100𝑛100n=100italic_n = 100 with constant rank solution trajectory.
Refer to caption
Figure 1: Distribution of the average residuals as a function of the step size using three different methods: an interior point method (IPM), in bordeaux, the splitting conic solver (SCS), in orange, and our path following (PF) algorithm, in green. The data in both plots are the same except that the left plot also shows ten rank changing instances, depicted by light green dots, which were removed in the right plot.

Figure 1 depicts the distribution over 100 instances of the average residuals along the tracking of the solution on the time interval [0,1]01[0,1][ 0 , 1 ], as a function of the used step size. For each whisker plot, the error bars span the interval from the minimum to the maximum, while the box spans the first quartile to the third quartile, with a horizontal line at the median.

In the left plot, the light green dots correspond to the average residuals of the 10 rank-changing instances; instead, the right plot excludes these degenerate instances form the data set. Notice that these points correspond to TV-SDP instances that do not satisfy our assumption (A4). The green plot shows the average residual obtained by tracking the solution with Algorithm 1, the orange plot shows the average residual when the tracking is done using SCS with relative and absolute feasibility tolerances set to 107superscript10710^{-7}10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT, warm-started with the current solution; finally, the bordeaux color plot shows the average residual when the tracking is done using MOSEK IPM [40] with the relative gap termination tolerances set to 1015superscript101510^{-15}10 start_POSTSUPERSCRIPT - 15 end_POSTSUPERSCRIPT.

The residual of an SDP primal-dual solution (X,Z(λ))𝑋𝑍𝜆(X,Z(\lambda))( italic_X , italic_Z ( italic_λ ) ) is defined, in analogy to (RES), as

rest(X,λ):=2[Ct𝒜t*(λ)]X𝒜t(X)bt.assignsubscriptres𝑡𝑋𝜆subscriptnorm2delimited-[]subscript𝐶𝑡subscriptsuperscript𝒜𝑡𝜆𝑋subscript𝒜𝑡𝑋subscript𝑏𝑡\operatorname{res}_{t}(X,\lambda):=\left\|\begin{array}[]{c}2[C_{t}-\mathcal{A% }^{*}_{t}(\lambda)]X\\ \mathcal{A}_{t}(X)-b_{t}\end{array}\right\|_{\infty}.roman_res start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_X , italic_λ ) := ∥ start_ARRAY start_ROW start_CELL 2 [ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - caligraphic_A start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_λ ) ] italic_X end_CELL end_ROW start_ROW start_CELL caligraphic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_X ) - italic_b start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT .

By choosing a suitable step size (in our experiments order 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT), Algorithm 1 yields an average residual accuracy that is comparable to the one obtained using standard IPMs with very small relative gap termination tolerance. For a step size of order 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, our algorithm exhibits a residual precision that is 100 times more accurate than both IPM and warm-started SCS. Furthermore, as we see next, this accuracy is reached much faster with our approach.

In Figure 2 we plot the distributions of the runtimes of Algorithm 1 (green) as a function of the step size, as well as the distributions of the runtimes of IPM (bordeaux) used with relative gap termination tolerances 1015superscript101510^{-15}10 start_POSTSUPERSCRIPT - 15 end_POSTSUPERSCRIPT and of the warm-started SCS (orange) to track the solutions trajectory at a constant step size resolution.

Remarkably, for each step size that we tested, the mean runtime of Algorithm 1 is on average about ten times smaller then both SCS and MOSEK IPM, indicating competitive computational performances of our algorithm.


Refer to caption
Figure 2: Distribution of the runtime as function of the step size.
Refer to caption
(a) Relative gap termination tolerance =109absentsuperscript109=10^{-9}= 10 start_POSTSUPERSCRIPT - 9 end_POSTSUPERSCRIPT
Refer to caption
(b) Relative gap termination tolerance =1015absentsuperscript1015=10^{-15}= 10 start_POSTSUPERSCRIPT - 15 end_POSTSUPERSCRIPT
Figure 3: Average runtime of MOSEK IPM and Algorithm 1 for tracking the TV-SDP solutions with the same residual accuracy on a grid, as a function of the number of gridpoints.

Finally, we apply Algorithm 1 to the same set of TV-MCR problems allowing for a step size adjustment (setting step size_TUNING to TRUE). In order to provide a fair comparison with MOSEK IPM, we fixed five subdivisions of the interval [0,1]01[0,1][ 0 , 1 ] in a grid of, respectively, 20, 40, 60, 80, and 100 equidistant points. For each grid, at each time point, we used MOSEK with a relative gap termination tolerance of 1014superscript101410^{-14}10 start_POSTSUPERSCRIPT - 14 end_POSTSUPERSCRIPT to obtain the corresponding TV-SDP solution, recording the runtime and the average residual over the tracking of each instance. For each grid, we then run our algorithm with step size adjustment in order to ensure the same average residual accuracy guaranteed by MOSEK, additionally enforcing the path-following procedure to hit the grid points. In this way, we ensure that our procedure has the same accuracy of MOSEK both in terms of the solution residual and of the tracking resolution.

Figure 3 shows the distributions of the runtimes as a function of the number of grid points of both Algorithm 1 (green) and IPM with two different relative gap termination tolerances: 109superscript10910^{-9}10 start_POSTSUPERSCRIPT - 9 end_POSTSUPERSCRIPT (Figure 3(a)) and 1015superscript101510^{-15}10 start_POSTSUPERSCRIPT - 15 end_POSTSUPERSCRIPT (Figure 3(b)).

Encouragingly, we observe that we can ensure both the same accuracy and tracking resolution of MOSEK at a smaller average runtime. The constant behavior of the green plot on the right is due to the fact that, in order to ensure the same residual accuracy of the IPM, the path-following procedure needs to consider a number of points that are quite denser then the number of grid points, and hence independent from this latter, while for the plot on the left it is instead sufficient for Algorithm 1 to follow the grid.

6 Conclusion

In this paper, we proposed an algorithm for solving time-varying SDPs based on a path-following predictor-corrector scheme for the Burer–Monteiro factorization. The restriction to a horizontal space ensures that the linearized KKT conditions system is uniquely solvable under standard regularity assumptions on the TV-SDP problem, thus leading to a well-defined path-following procedure with rigorous error bounds on the distance from the optimal trajectory. Preliminary numerical experiments on a time-varying version of the max-cut SDP relaxation suggest that our algorithm is competitive both in terms of runtime and accuracy when compared to the application of standard IPMs. Future work should explore the applicability and relative merits of our approach in further applications.

So far we have assumed that the rank r𝑟ritalic_r of the true solution curve is known and remains constant. While this is certainly appropriate for a rigorous analysis as conducted in this work, it might be restrictive in practice. An important extension hence would be to develop rank-adaptive versions of our path-following approach that are able to detect and adjust the appropriate rank in a Burer–Monteiro factorization, for example, by monitoring the smallest singular values of the matrices Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Another important aspect is the initialization of the method, which requires an accurate SDP solution and is currently not based on Burer–Monteiro factorization, thus undermining the computational efficiency of the whole approach. The obvious way out is to also solve the initial time problem using the factorized approach [16]. The metaalgorithm presented in [36] even does this in a rank-adaptive way. Although this is a nonconvex problem, several works, including also [13, 48, 19], have considered Burer–Monteiro schemes with guaranteed and certifiable convergence to a globally optimal low-rank factor under mild conditions, making this a reliable approach in practice.

Acknowledgments

The research leading to these results received funding from the OP RDE under Grant Agreement CZ.02.1.01/0.0/0.0/16_019/0000765. The first author gratefully acknowledges the support of the Czech Science Foundation (grant 22-15524S). The authors also thank two anonymous referees for their helpful comments.

References

  • [1] S. Aaronson, X. Chen, E. Hazan, S. Kale, and A. Nayak. Online learning of quantum states. J. Stat. Mech. Theory Exp., 2019, pages 124019, 14, 2019.
  • [2] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization algorithms on matrix manifolds. Princeton University Press, Princeton, NJ, 2008.
  • [3] A. A. Ahmadi and B. El Khadir. Time-varying semidefinite programs. Math. Oper. Res., 46(3):1054–1080, 2021.
  • [4] F. Alizadeh, J.-P. A. Haeberly, and M. L. Overton. Complementarity and nondegeneracy in semidefinite programming. Math. Program., 77(2, Ser. B):111–128, 1997.
  • [5] E. L. Allgower and K. Georg. Introduction to numerical continuation methods. SIAM, Philadelphia, 2003.
  • [6] E. D. Andersen, C. Roos, and T. Terlaky. On implementing a primal-dual interior-point method for conic quadratic optimization. Math. Program., 95:249–277, 2003.
  • [7] E. J. Anderson. A Continuous Model For Job-Shop Scheduling. PhD thesis, University of Cambridge, Cambridge, 1978.
  • [8] R. Balan and C. B. Dock. Lipschitz analysis of generalized phase retrievable matrix frames. SIAM J. Matrix Anal. Appl., 43(3):1518–1571, 2022.
  • [9] A. Barvinok. Problems of distance geometry and convex properties of quadratic maps. Discrete Comput. Geom., 13(2):189–202, 1995.
  • [10] A. Barvinok. A remark on the rank of positive semidefinite matrices subject to affine constraints. Discrete Comput. Geom., 25(1):23–31, 2001.
  • [11] R. Bellman. Bottleneck problems and dynamic programming. Proc. Nat. Acad. Sci. USA, 39:947–951, 1953.
  • [12] A. Bellon, D. Henrion, V. Kungurtsev, and J. Mareček. Time-varying semidefinite programming: Geometry of the trajectory of solutions. arXiv:2104.05445, 2021.
  • [13] N. Boumal. A Riemannian low-rank method for optimization over semidefinite matrices with block-diagonal constraints. arXiv:1506.00575, 2015.
  • [14] N. Boumal, V. Voroninski, and A. Bandeira. The non-convex Burer-Monteiro approach works on smooth semidefinite programs. In D. Lee et al., editor, Advances in Neural Information Processing Systems, volume 29, pages 2757–2765. Curran Associates, Inc., 2016.
  • [15] N. Boumal, V. Voroninski, and A. S. Bandeira. Deterministic guarantees for Burer-Monteiro factorizations of smooth semidefinite programs. Comm. Pure Appl. Math., 73(3):581–608, 2020.
  • [16] S. Burer and R. D. C. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program., 95(2, Ser. B):329–357, 2003.
  • [17] S. Burer and R. D. C. Monteiro. Local minima and convergence in low-rank semidefinite programming. Math. Program., 103(3, Ser. A):427–444, 2005.
  • [18] D. Cifuentes. On the Burer-Monteiro method for general semidefinite programs. Optim. Lett., 15(6):2299–2309, 2021.
  • [19] D. Cifuentes and A. Moitra. Polynomial time guarantees for the Burer-Monteiro method. In S. Koyejo et al., editor, Advances in Neural Information Processing Systems, volume 35, pages 23923–23935. Curran Associates, Red Hook, NY, 2022.
  • [20] M. Colombo, J. Gondzio, and A. Grothey. A warm-start approach for large-scale stochastic linear programs. Math. Program., 127(2, Ser. A):371–397, 2011.
  • [21] G. B. Dantzig. Large-scale systems optimizations with application to energy. Technical report SOL 77-3, 4 1977.
  • [22] Q. T. Dinh, C. Savorgnan, and M. Diehl. Adjoint-based predictor-corrector sequential convex programming for parametric nonlinear optimization. SIAM J. Optim., 22(4):1258–1284, 2012.
  • [23] A. Engau, M. F. Anjos, and A. Vannelli. On interior-point warmstarts for linear and combinatorial optimization. SIAM J. Optim., 20(4):1828–1861, 2010.
  • [24] R. M. Freund. On the behavior of the homogeneous self-dual model for conic convex optimization. Math. Program., 106(3, Ser. A):527–545, 2006.
  • [25] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM, 42(6):1115–1145, 1995.
  • [26] D. Goldfarb and K. Scheinberg. On parametric semidefinite programming. In Proceedings of the Stieltjes Workshop on High Performance Optimization Techniques (HPOPT ’96), pages 361–377. Elsevier, Amsterdam, Appl. Numer. Math. 29, 1999.
  • [27] J. Gondzio and A. Grothey. Reoptimization with the primal-dual interior point method. SIAM J. Optim., 13(3):842–864, 2002.
  • [28] J. Gondzio and A. Grothey. A new unblocking technique to warmstart interior point methods based on sensitivity analysis. SIAM J. Optim., 19(3):1184–1210, 2008.
  • [29] J. Guddat, F. Guerra Vazquez, and H. T. Jongen. Parametric optimization: singularities, pathfollowing and jumps. B. G. Teubner, Stuttgart; John Wiley & Sons, Ltd., Chichester, 1990.
  • [30] J. D. Hauenstein, A. Mohammad-Nezhad, T. Tang, and T. Terlaky. On computing the nonlinearity interval in parametric semidefinite optimization. Math. Oper. Res., 47(4):2989–3009, 2022.
  • [31] M. Henzinger, A. Noe, and C. Schulz. Practical fully dynamic minimum cut algorithms. In 2022 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), SIAM, Phildelphia, pages 13–26, 2022.
  • [32] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge University Press, Cambridge, second edition, 2013.
  • [33] J. Im and H. Wolkowicz. A strengthened Barvinok-Pataki bound on SDP rank. Oper. Res. Lett., 49(6):837–841, 2021.
  • [34] F. Jarre. An interior-point method for minimizing the maximum eigenvalue of a linear combination of matrices. SIAM J. Control Optim., 31(5):1360–1377, 1993.
  • [35] H. Jiang, T. Kathuria, Y. T. Lee, S. Padmanabhan, and Z. Song. A faster interior point method for semidefinite programming. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science, pages 910–918. IEEE Computer Society, Los Alamitos, CA, 2020.
  • [36] M. Journée, F. Bach, P.-A. Absil, and R. Sepulchre. Low-rank optimization on the cone of positive semidefinite matrices. SIAM J. Optim., 20(5):2327–2351, 2010.
  • [37] E. Kao, V. Gadepally, M. Hurley, M. Jones, J. Kepner, S. Mohindra, P. Monticciolo, A. Reuther, S. Samsi, W. Song, D. Staheli, and S. Smith. Streaming graph challenge: Stochastic block partition. In 2017 IEEE High Performance Extreme Computing Conference (HPEC), IEEE, Piscataway, NJ, pages 1–12, 2017.
  • [38] J. Lavaei and S. H. Low. Zero duality gap in optimal power flow problem. IEEE Transactions on Power Systems, 27(1):92–107, 2012.
  • [39] E. Massart and P.-A. Absil. Quotient geometry with simple geodesics for the manifold of fixed-rank positive-semidefinite matrices. SIAM J. Matrix Anal. Appl., 41(1):171–198, 2020.
  • [40] MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual. Version 9.3., 2019.
  • [41] Y. Nazarathy and G. Weiss. Near optimal control of queueing networks over a finite time horizon. Ann. Oper. Res., 170:233–249, 2009.
  • [42] Y. Nesterov. Lectures on convex optimization. Springer, Cham, Switzerland, 2018.
  • [43] J. Nocedal and S. Wright. Numerical optimization. Springer, New York, second edition, 2006.
  • [44] B. O’Donoghue. Operator splitting for a homogeneous embedding of the linear complementarity problem. SIAM J. Optim., 31(3):1999–2023, 2021.
  • [45] B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd. Conic optimization via operator splitting and homogeneous self-dual embedding. J. Optim. Theory Appl., 169(3):1042–1068, 2016.
  • [46] B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd. SCS: Splitting Conic Solver, version 3.2.2. https://github.com/cvxgrp/scs, Nov. 2022.
  • [47] G. Pataki. On the rank of extreme matrices in semidefinite programs and the multiplicity of optimal eigenvalues. Math. Oper. Res., 23(2):339–358, 1998.
  • [48] D. M. Rosen. Scalable low-rank semidefinite programming for certifiably correct machine perception. In Algorithmic Foundations of Robotics XIV, pages 551–566. Springer, Cham, Switzerland, 2021.
  • [49] D. M. Rosen, L. Carlone, A. S. Bandeira, and J. J. Leonard. A certifiably correct algorithm for synchronization over the special Euclidean group, pages 64–79. in Algorithmic Foundations of Robotics XII, Springer, Cham, Switzerland, 2020.
  • [50] B. A. Schmitt. Perturbation bounds for matrix square roots and Pythagorean sums. Linear Algebra Appl., 174:215–227, 1992.
  • [51] A. Skajaa, E. D. Andersen, and Y. Ye. Warmstarting the homogeneous and self-dual interior point method for linear and conic quadratic problems. Math. Program. Comput., 5(1):1–25, 2013.
  • [52] F. Teren. Minimum time acceleration of aircraft turbofan engines by using an algorithm based on nonlinear programming. In NASA Technical Memorandum TM-73741, Lewis Research Center, Cleveland, Ohio, September, 1977.
  • [53] L. Tunçel. Potential reduction and primal-dual methods. In Handbook of semidefinite programming, volume 27 of Internat. Ser. Oper. Res. Management Sci., pages 235–265. Kluwer Acad., Boston, MA, 2000.
  • [54] I. Waldspurger and A. Waters. Rank optimality for the Burer-Monteiro factorization. SIAM J. Optim., 30(3):2577–2602, 2020.
  • [55] X. Wang, S. Zhang, and D. D. Yao. Separated continuous conic programming: strong duality and an approximation algorithm. SIAM J. Control Optim., 48(4):2118–2138, 2009.
  • [56] H. Wolkowicz, R. Saigal, and L. Vandenberghe, editors. Handbook of semidefinite programming. Kluwer Academic, Boston, MA, 2000.
  • [57] S. J. Wright. An algorithm for degenerate nonlinear programming with rapid local convergence. SIAM J. Optim., 15(3):673–696, 2005.