Time -Varying Semidefinite Programming:
Path Following a Burer–Monteiro Factorization

Antonio Bellon Faculty of Electrical Engineering, Czech Technical University in Prague, Karlovo Namesti 13, Prague 121 35, the Czech Republic Mareike Dressler School of Mathematics and Statistics, University of New South Wales, Sydney, NSW 2052, Australia Vyacheslav Kungurtsev

{}^{*}

Jakub Mareček

{}^{*}

André Uschmajew Institute of Mathematics & Centre for Advanced Analytics and Predictive Sciences, University of Augsburg, 86159 Augsburg, Germany

Abstract

We present an online algorithm for time-varying semidefinite programs (TV-SDPs), based on the tracking of the solution trajectory of a low-rank matrix factorization, also known as the Burer–Monteiro factorization, in a path-following procedure. There, a predictor-corrector algorithm solves a sequence of linearized systems. This requires the introduction of a horizontal space constraint to ensure the local injectivity of the low-rank factorization. The method produces a sequence of approximate solutions for the original TV-SDP problem, for which we show that they stay close to the optimal solution path if properly initialized. Numerical experiments for a time-varying max-cut SDP relaxation demonstrate the computational advantages of the proposed method for tracking TV-SDPs in terms of runtime compared to off-the-shelf interior point methods.

Key words. Semidefinite programming; nonlinear programming; parametric optimization;
time-varying constrained optimization; Newton type methods

MSC codes. 49M15, 90C22, 90C30, 90C31

1 Introduction

Semidefinite programs (SDPs) constitute an important class of convex constrained optimization problems that is ubiquitous in statistics, signal processing, control systems, and other areas. In several applications, the data of the problem vary over time, so that this can be modeled as a time-varying SDP (TV-SDP). In this paper we consider TV-SDPs of the form

$\displaystyle\min_{X\in\mathbb{S}^{n}}$	$\displaystyle\langle C_{t},X\rangle$	(SDP_$t$)
s.t.	$\displaystyle\mathcal{A}_{t}(X)=b_{t},$
$\displaystyle X\succeq 0,$

where $t\in[0,T]$ is a time parameter varying on a bounded interval. Here $\mathbb{S}^{n}$ denotes the space of real symmetric $n\times n$ matrices, $\mathcal{A}_{t}\colon\mathbb{S}^{n}\to\mathbb{R}^{m}$ is a linear operator defined by $\mathcal{A}_{t}(X)=(\langle A_{1,t},X\rangle,\dots,\langle A_{m,t},X\rangle)$ for some $A_{1,t},\dots,A_{m,t}\in\mathbb{S}^{n}$ , $b_{t}\in\mathbb{R}^{m}$ , and $C_{t}\in\mathbb{S}^{n}$ . Throughout the paper $\langle\cdot,\cdot\rangle$ denotes the Frobenius inner product and the constraint $X\succeq 0$ requires $X\in\mathbb{S}^{n}$ to be positive semidefinite. In this time-varying setting, one looks for a solution curve $t\mapsto X_{t}$ in $\mathbb{S}^{n}$ such that $X=X_{t}$ is an optimal solution for (SDP_$t$) at each time point $t\in[0,T]$ .

Time-dependent problems leading to TV-SDPs occur in various applications, such as optimal power flow problems in power systems [38], state estimation problems in quantum systems [1], modeling of energy economic problems [21], job-shop scheduling problems [7], as well as problems arising in signal processing, queueing theory [41], or aircraft engineering [52]. TV-SDPs can be seen as a generalization of continuous linear programming problems, which were first studied by Bellman [11] in relation to so-called bottleneck problems in multistage linear production economic processes. Since then, a large body of literature has been devoted to studying continuous linear programs with and without additional assumptions. However, the generalization of this idea to other classes of optimization problems has only recently been considered. In [55], Wang, Zhang, and Yao study continuous conic programs, and finally Ahmadi and Khadir [3] consider time-varying SDPs. In contrast to our setting, they require the data to vary polynomially with time and also restrict themselves to polynomial solutions. Moreover, the problems studied there involve kernel terms and more complicated constraints, while our work addresses TV-SDPs in a simpler sense of univariate parametric SDPs, following the literature thread of [26, 30].

A naive approach to solve the time-varying problem (SDP_$t$) is to consider, at a sequence of times $\{t_{k}\}_{k\in\{1,\dots,K\}}\subseteq[0,T]$ , the instances of the problem (SDP ${}_{t_{k}}$ ) for $k\in\{1,\dots,K\}$ and solve them one after another. The best solvers for SDPs are interior point methods [34, 53, 6, 24, 35], which can solve them in a time that is polynomial in the input size. However, these solvers do not scale particularly well, and thus this brute-force approach may fail in applications where the volume and velocity of the data are large. Furthermore, such a straightforward method would not make use of the local information collected by solving the previous instances of the problem. Even if one considers warm starts [27, 28, 23, 20, 51], the reduction in run time is likely to be marginal. For instance, [51, sections 5.5 and 5.6] reports a 30–60% reduction of the runtime on a collection of time-varying instances of their own choice.

Instead, in this work, we would like to utilize the idea of so-called path-following predictor-corrector algorithms as developed in [29, 5]. In classical predictor-corrector methods, a predictor step for approximating the directional derivative of the solution with respect to a small change in the time parameter is applied, together with a correction step that moves from the current approximate solution closer to the next solution at the new time point. The latter is based on a Newton step for solving the first-order optimality KKT conditions.

A limiting factor in solving both stationary and time-dependent SDPs is computational complexity when $n$ is large. A common solution to this obstacle is the Burer–Monteiro approach, as presented in the seminal work [16, 17]. In this approach, a low-rank factorization $X=YY^{T}$ of the solution is assumed with $Y\in\mathbb{R}^{n\times r}$ and $r$ potentially much smaller than $n$ . In the optimization literature, the Burer–Monteiro method has been very well studied as a nonconvex optimization problem, e.g., in terms of algorithms [36], quality of the optimal value [9, 47], and (global) recovery guarantees [14, 15, 18].

In a time-varying setting, the Burer–Monteiro factorization leads to

		$\displaystyle\min_{Y\in\mathbb{R}^{n\times r}}$		$\displaystyle\langle C_{t},YY^{T}\rangle$		(BM_$t$)
		s.t.		$\displaystyle\mathcal{A}_{t}(YY^{T})=b_{t},$		(BM_$t$)

which for every fixed $t$ is a quadratically constrained quadratic problem. A solution then is a curve $t\mapsto Y_{t}$ in $\mathbb{R}^{n\times r}$ , which, depending on $r$ , is a space of much smaller dimension than $\mathbb{S}^{n}$ . However, this comes at the price that the problem (BM_$t$) is now nonconvex. Moreover, theoretically it may happen that local optimization methods converge to a critical point that is not globally optimal [54], although in practice the method usually shows very good performance [16, 36, 49].

The aim of this work is to combine the Burer–Monteiro factorization with path-following predictor-corrector methods and to develop a practical algorithm for approximating the solution of (BM_$t$), and consequently of (SDP_$t$), over time. As we explain in section 3, to apply such methods, we need to address the issue that the solutions of (BM_$t$) are never isolated, due to the nonuniqueness of the Burer–Monteiro factorization caused by orthogonal invariance. In this paper, we apply a well-known technique to handle this problem by restricting the solutions to a so-called horizontal space at every time step. From a geometric perspective, such an approach exploits the fact that equivalent factorizations can be identified as the same element in the corresponding quotient manifold with respect to the orthogonal group action [39].

The paper is structured as follows. In section 2 we review important foundations from the SDP literature and state the main assumptions we make on the TV-SDP problem (SDP_$t$). Section 3 presents the underlying quotient geometry of positive semidefinite rank- $r$ matrices from a linear algebra perspective, focusing in particular on the notion of horizontal space and the domain of injectivity of the map $Y\mapsto YY^{T}$ . We then describe in section 4 our path-following predictor-corrector algorithm, which is based on iteratively solving the linearized KKT system for (BM_$t$) over time. A main result is the rigorous error analysis for this algorithm presented in subsection 4.3. In section 5, we showcase numerical results that test our method on a time-varying variant of the well-known Goemans–Williamson SDP relaxation for the Max-Cut problem in combinatorial optimization and graph theory. We conclude in section 6 with a brief discussion of our results.

2 Preliminaries and key assumptions

Naturally, the rigorous formulation of path-following algorithms requires regularity assumptions on the solution curve. In our context, this will require both assumptions on the original TV-SDP problem (SDP_$t$) as well as on its reformulation (BM_$t$). In particular for the latter, the correct choice of the dimension $r$ is crucial. In what follows, we present and discuss these assumptions in detail.

First, we briefly review some standard notions and properties for primal-dual SDP pairs; see [4, 56]. Consider the conic dual problem of (SDP_$t$):

		$\displaystyle\max_{w\in\mathbb{R}^{m}}$		$\displaystyle\langle b_{t},w\rangle$		(D-SDP_$t$)
		s.t.		$\displaystyle Z(w)\coloneqq C_{t}-\mathcal{A}^{*}_{t}(w)\succeq 0$		(D-SDP_$t$)

where $\mathcal{A}_{t}^{*}\colon w\mapsto\sum_{i=1}^{m}w_{i}A_{i,t}$ is the linear operator adjoint to $\mathcal{A}_{t}$ . For convenience, we often drop the explicit dependence on $w$ and refer to a solution of (D-SDP_$t$) simply as $Z$ . While reviewing the basic properties of SDPs, we assume the time parameter to be fixed and hence omit the subindex $t$ .

The KKT conditions for the pair of primal-dual convex problems (SDP_$t$)-(D-SDP_$t$) read

$\displaystyle\mathcal{A}(X)$	$\displaystyle=b,$	$\displaystyle X\succeq 0,$	(2.1)
$\displaystyle Z+\mathcal{A}^{*}(w)$	$\displaystyle=C,$	$\displaystyle Z\succeq 0,$
$\displaystyle XZ$	$\displaystyle=0.$

These are sufficient conditions for the optimality of the pair $(X,Z)$ .

Definition 2.1 (strict feasibility).

We say that strict feasibility holds for an instance of primal SDP if there exists a positive definite matrix $X\succ 0$ that satisfies $\mathcal{A}(X)=b$ . Similarly, strict feasibility holds for the dual if there exist a vector $w\in\mathbb{R}^{m}$ satisfying $Z(w)\succ 0$ .

It is well-known that under strict feasibility the KKT conditions are also necessary for optimality. Note that, in general, a pair $(X,Z)$ of optimal solutions satisfies the inclusions $\operatorname{im}X\subseteq\operatorname{ker}Z$ and $\operatorname{im}Z\subseteq\operatorname{ker}X$ , where $``\operatorname{im}"$ and $``\operatorname{ker}"$ denote the image and kernel, respectively.

Definition 2.2 (strict complementarity).

A primal-dual optimal point $(X,Z)$ is said to be strictly complementary if $\operatorname{im}X=\operatorname{ker}Z$ (or, equivalently, $\operatorname{im}Z=\operatorname{ker}X$ ). A primal-dual pair of an instance of SDP satisfies strict complementarity if there exists a strictly complementary primal-dual optimal point $(X,Z)$ .

Definition 2.3 (nondegeneracy).

A primal feasible point $X$ is primal nondegenerate if

\ker\mathcal{A}+\mathcal{T}_{X}=\mathbb{S}^{n},

(2.2)

with $\mathcal{T}_{X}$ being the tangent space to the manifold $\mathcal{M}_{r}$ of fixed rank- $r$ symmetric matrices at $X$ , where $r=\operatorname{rank}X$ . Let $X=YY^{T}$ be a rank-revealing decomposition, then

\mathcal{T}_{X}=\{YV^{T}+VY^{T}\colon V\in\mathbb{R}^{n\times r}\}.

A dual feasible point $Z$ is dual nondegenerate if

\operatorname{im}\mathcal{A}^{*}+\mathcal{T}_{Z}=\mathbb{S}^{n},

where $\mathcal{T}_{Z}$ is the tangent space at $Z$ to the manifold of fixed rank- $s$ symmetric matrices with $s=\operatorname{rank}Z$ .

Primal-dual strict feasibility implies the existence of both a primal and a dual optimal solution with a zero duality gap. In addition, primal (dual) nondegeneracy implies dual (primal) uniqueness of the solutions. Under strict complementarity, the converse is also true, that is, primal (dual) uniqueness of the primal dual optimal solutions implies dual (primal) nondegeneracy of these solutions. Moreover, primal-dual nondegeneracy and strict complementarity hold generically. We refer to [4] for details.

With regard to the time-varying case, these facts can be generalized as follows.

Theorem 2.4.

(Bellon et al., [12, Theorem 2.19]) Let (P_$t$,D_$t$) be a primal-dual pair of TV-SDPs parametrized over a time interval $[0,T]$ such that primal-dual strict feasibility holds for any $t\in[0,T]$ and assume that the data $\mathcal{A}_{t},b_{t},C_{t}$ are continuously differentiable functions of $t$ . Let $t^{*}\in[0,T]$ be a fixed value of the time parameter and suppose that $(X^{*},Z^{*})$ is a nondegenerate optimal and strictly complementary point for (P_$t^{*}$,D_$t^{*}$). Then there exists $\varepsilon>0$ and a continuously differentiable unique map** $t\mapsto(X_{t},Z_{t})$ defined on $(t^{*}-\varepsilon,t^{*}+\varepsilon)$ such that $(X_{t},Z_{t})$ is a unique and strictly complementary primal-dual optimal point to (P_$t$,D_$t$) for all $t\in(t^{*}-\varepsilon,t^{*}+\varepsilon)$ . In particular, the ranks of $X_{t}$ and $Z_{t}$ are constant for all $t\in(t^{*}-\varepsilon,t^{*}+\varepsilon)$ .

The last statement of the theorem directly follows from the fact that a change in the rank of either $X_{t}$ or $Z_{t}$ implies a loss of strict complementarity because of the lower-semicontinuity of the rank. Based on these facts, for the initial problem (SDP_$t$) we make the following assumptions.

(A1)

(SDP_$t$) and (D-SDP_$t$) are strictly feasible for any $t\in[0,T]$ .
(A2)

The linear operator $\mathcal{A}_{t}$ is surjective in any $t\in[0,T]$ .
(A3)

(SDP_$t$) has a primal nondegenerate solution $X_{t}$ and
(D-SDP_$t$) has a dual nondegenerate solution $Z_{t}$ at any $t\in[0,T]$ .
(A4)

The solution pair $(X_{t},Z_{t})$ is strictly complementary for any $t\in[0,T]$ .
(A5)

Data $\mathcal{A}_{t},b_{t},C_{t}$ are continuously differentiable functions of $t$ .

Assumptions (A1) and (A2) are standard for SDPs and in linearly constrained optimization in general, while assumptions (A3)–(A4) rule out many “pathological” cases [12]. In particular, assumption (A3) implies that the solution pair $(X_{t},Z_{t})$ is unique. By Theorem 2.4, assumptions (A3), (A4), and (A5) have the following consequences:

(C1)

(SDP_$t$) has a unique and smooth solution curve $X_{t}$ , $t\in[0,T]$ .
(C2)

The curve $t\mapsto X_{t}$ is of constant rank $r^{*}$ .

For setting up the factorized version (BM_$t$) of (SDP_$t$), it is necessary to choose the dimension $r$ of the factor matrix $Y$ in (BM_$t$), ideally equal to $r^{*}$ of (C2). In what follows, we assume that we know the constant rank $r^{*}$ . Given access to an initial solution $X_{0}$ at time $t=0$ , it is possible to compute $r^{*}$ , so this assumption is without further loss of generality.

It is worth noting that the rank cannot be arbitrary. Based on a known result of Barvinok and Pataki [10], for any SDP defined by $m$ linearly independent constraints, there always exists a solution of rank $r$ such that $\frac{1}{2}r(r+1)\leq m$ . Since we assume that $X_{t}$ is the unique solution to (SDP_$t$) with constant rank $r^{*}$ we conclude that

\frac{1}{2}r^{*}(r^{*}+1)\leq m.

We point out that recently the Barvinok–Pataki bound has been slightly improved [33].

3 Quotient geometry of positive semidefinite rank- $\bm{r}$ matrices

We now investigate the factorized formulation (BM_$t$) in more detail. As already mentioned, in contrast to the original problem (SDP_$t$), this is a nonlinear problem (specifically, a quadratically constrained quadratic problem) which is nonconvex. Moreover, the property of uniqueness of a solution, which is guaranteed by (C1) for the original problem (SDP_$t$), is lost in (BM_$t$), because its representation via the map

\phi:\mathbb{R}^{n\times r}\to\mathbb{S}^{n},\quad\phi(Y)=YY^{T}

is not unique. In fact, this map is invariant under the orthogonal group action

\mathcal{O}_{r}\times\mathbb{R}^{n\times r}\to\mathbb{R}^{n\times r},\quad(Q,Y% )\mapsto YQ,

on $\mathbb{R}^{n\times r}$ , where

\mathcal{O}_{r}:=\{Q\in\mathbb{R}^{r\times r}\ :\ QQ^{T}=I_{r}\},

with $I_{r}$ denoting the $r\times r$ identity matrix, is the orthogonal group. Hence both the objective function $Y\mapsto\langle C_{t},YY^{T}\rangle$ and the constraints $\mathcal{A}_{t}(YY^{T})=b_{t}$ in (BM_$t$) are invariant under the same action. As a consequence, the solutions of (BM_$t$) are never isolated [36]. This poses a technical obstacle to the use of path-following algorithms, as the path needs to be, at least locally, uniquely defined.

On the other hand, by assuming that the correct rank $r=r^{*}$ of a unique solution $X_{t}$ for (SDP_$t$) has been chosen for the factorization, any solution $Y_{t}$ for (BM_$t$) must satisfy $Y_{t}Y_{t}^{T}=X_{t}$ . From this it follows that any solution is of the form $Y_{t}Q$ with $Q\in\mathcal{O}_{r}$ ; see, e.g., [17, Lemma 2.1]. In other words, the action of the orthogonal group is indeed the only source of nonuniqueness. This corresponds to the well-known fact that the set of positive definite fixed rank- $r$ symmetric matrices, which we denote by $\mathcal{M}_{r}^{+}$ , is a smooth manifold that can be identified with the quotient manifold $\mathbb{R}_{*}^{n\times r}/\mathcal{O}_{r}$ , where $\mathbb{R}_{*}^{n\times r}$ is the open set of $n\times r$ matrices with full column rank.

In the following, we describe how the nonuniqueness can be removed by introducing a so-called horizontal space, which is a standard concept in optimization on quotient manifolds, see, e.g., [2, section 3.5.8]. For positive semidefinite fixed-rank matrices, this has been worked out in detail in [39]. Additional material, including the complex Hermitian case, can be found in [8]. However, in order to arrive at practical formulas that are useful for our path-following algorithm later on, we will not further refer to the concept of a quotient manifold but directly focus on the injectivity of the map $\phi$ on suitable linear subspaces of $\mathbb{R}^{n\times r}$ , which we describe in the following section. Such a simplification takes into account that we are dealing with a quotient manifold $\mathbb{R}_{*}^{n\times r}/\mathcal{O}_{r}$ with $\mathbb{R}_{*}^{n\times r}$ being just an open subset of $\mathbb{R}^{n\times r}$ . Then the horizontal space at a point $Y$ should be a subspace of the tangent space of $\mathbb{R}_{*}^{n\times r}$ at $Y$ , which, however, is just $\mathbb{R}^{n\times r}$ .

3.1 Horizontal space and unique factorizations

Given $Y\in\mathbb{R}^{n\times r}_{*}$ , we denote the corresponding orbit under the orthogonal group as

Y\mathcal{O}_{r}:=\{YQ\,:\,Q\in\mathcal{O}_{r}\}\subseteq\mathbb{R}^{n\times r% }_{*}.

The orbit $Y\mathcal{O}_{r}$ is an embedded submanifold of $\mathbb{R}^{n\times r}_{*}$ of dimension $\tfrac{1}{2}r(r-1)$ with two connected components, according to $\operatorname{det}Q=\pm 1$ . Its tangent space at $Y$ , which we denote by $\mathcal{T}_{Y}$ , is easily derived by noting that the tangent space to the orthogonal group $\mathcal{O}_{r}$ at the identity matrix equals the space of real skew-symmetric matrices $\mathbb{S}^{r}_{skew}$ (see, e.g., [2, Example 3.5.3]). Therefore,

\mathcal{T}_{Y}=\{YS\,:\,S\in\mathbb{S}^{r}_{skew}\}.

Since the map $\phi(Y)=YY^{T}$ is constant on $Y\mathcal{O}_{r}$ , its derivative

Y\mapsto\phi^{\prime}(Y)[H]=YH^{T}+HY^{T}

vanishes on $\mathcal{T}_{Y}$ , that is $\mathcal{T}_{Y}\subseteq\operatorname{ker}\phi^{\prime}(Y)$ .

The horizontal space at $Y$ , denoted by $\mathcal{H}_{Y}$ , is the orthogonal complement of $\mathcal{T}_{Y}$ with respect to the Frobenius inner product. One verifies that

\mathcal{H}_{Y}:=\mathcal{T}_{Y}^{\perp}=\{H\in\mathbb{R}^{n\times r}\,:\,Y^{T% }H=H^{T}Y\},

since $0=\langle H,YS\rangle=\langle Y^{T}H,S\rangle$ holds for all skew-symmetric $S$ if and only if $Y^{T}H$ is symmetric. We point out that sometimes any subspace complementary to $\mathcal{T}_{Y}$ is called a horizontal space, but we will stick to the above choice, as it is the most common and has certain theoretical and practical advantages. In particular, since $Y\in\mathcal{H}_{Y}$ , the affine space $Y+\mathcal{H}_{Y}$ equals $\mathcal{H}_{Y}$ , so it is just a linear space.

The purpose of the horizontal space is to provide a unique way of representing a neighborhood of $X=YY^{T}$ in $\mathcal{M}_{r}^{+}$ through $\phi(Y+H)=(Y+H)(Y+H)^{T}$ with $H\in\mathcal{H}_{Y}$ . Clearly,

\dim\mathcal{H}_{Y}=nr-\dim\mathcal{O}_{r}=nr-\frac{1}{2}r(r-1)=\dim\mathcal{M% }_{r}^{+}.

Moreover, the following holds.

Proposition 3.1.

The restriction of $\phi^{\prime}(Y)$ to $\mathcal{H}_{Y}$ is injective. In particular, it holds that

\|YH^{T}+HY^{T}\|_{F}\geq\sqrt{2}\sigma_{r}(Y)\|H\|_{F}\quad\text{for all $H% \in\mathcal{H}_{Y}$,}

where $\sigma_{r}(Y)>0$ is the smallest singular value of $Y$ . This lower bound is sharp if $r<n$ . For $r=n$ one has the sharp estimate

\|YH^{T}+HY^{T}\|_{F}\geq 2\sigma_{r}(Y)\|H\|_{F}\quad\text{for all $H\in% \mathcal{H}_{Y}$.}

As a consequence, in either case, $\operatorname{ker}\phi^{\prime}(Y)=\mathcal{T}_{Y}$ .

Proof.

For $Z\in\mathbb{S}^{n}$ we have $\operatorname{trace}((YH^{T}+HY^{T})Z)=2\operatorname{trace}(ZYH^{T})$ by standard properties of the trace. Taking $Z=YH^{T}+HY^{T}$ yields

\displaystyle\|YH^{T}+HY^{T}\|_{F}^{2}

\displaystyle=2\operatorname{trace}(YH^{T}YH^{T}+HY^{T}YH^{T})=2\|Y^{T}H\|_{F}% ^{2}+2\|YH^{T}\|_{F}^{2}.

To derive the second equality we used $Y^{T}H=H^{T}Y$ for $H\in\mathcal{H}_{Y}$ . Clearly, $\|YH^{T}\|_{F}^{2}\geq\sigma_{r}(Y)^{2}\|H\|_{F}^{2}$ and if $r=n$ we also have that $\|Y^{T}H\|_{F}^{2}\geq\sigma_{r}(Y)^{2}\|H\|_{F}^{2}$ . This proves the asserted lower bounds. To show that they are sharp, let $(u_{r},v_{r})$ be a (normalized) singular vector tuple such that $Yv_{r}=\sigma_{r}(Y)u_{r}$ . If $r<n$ , then for any $u$ such that $u^{T}Y=0$ one verifies that the matrix $H=uv_{r}^{T}$ is in $\mathcal{H}_{Y}$ and achieves equality. When $r=n$ , $H=u_{r}v_{r}^{T}$ achieves it. ∎

Since $\phi$ maps $\mathbb{R}_{*}^{n\times r}$ to $\mathcal{M}_{r}^{+}$ , which is of the same dimension as $\mathcal{H}_{Y}$ , the above proposition implies that $\phi^{\prime}(Y)$ is a bijection between $\mathcal{H}_{Y}$ and $\mathcal{T}_{\phi(Y)}\mathcal{M}_{r}^{+}$ . This already shows that the restriction of $\phi$ to the linear space $Y+\mathcal{H}_{Y}=\mathcal{H}_{Y}$ is a local diffeomorphism between a neighborhood of $Y$ in $\mathcal{H}_{Y}$ and a neighborhood of $\phi(Y)$ in $\mathcal{M}_{r}^{+}$ . The subsequent more quantitative statement matches Theorem 6.3 in [39] on the injectivity radius of the quotient manifold $\mathbb{R}_{*}^{n\times r}/\mathcal{O}_{r}$ . For convenience we will provide a self-contained proof that is more algebraic and does not require the concept of quotient manifolds.

Proposition 3.2.

Let $\mathcal{B}_{Y}:=\{H\in\mathcal{H}_{Y}\colon\|H\|_{F}<\sigma_{r}(Y)\}$ . Then the restriction of $\phi$ to $Y+\mathcal{B}_{Y}$ is injective and maps diffeomorphically to a (relatively) open neighborhood of $Y$ in $\mathcal{M}_{r}^{+}$ .

It is interesting to note that $\mathcal{B}_{Y}$ is the largest possible ball in $\mathcal{H}_{Y}$ on which the result can hold, since the rank-one matrices $\sigma_{i}u_{i}v_{i}^{T}$ comprised of singular pairs of $Y$ all belong to $\mathcal{H}_{Y}$ and $Y-\sigma_{r}u_{r}v_{r}^{T}$ is rank-deficient. Another important observation is that $\sigma_{r}(Y)$ does not depend on the particular choice of $Y$ within the orbit $Y\mathcal{O}_{r}$ .

Proof.

Consider $H_{1},H_{2}\in\mathcal{B}_{Y}$ . Let $Y=U\Sigma V^{T}$ be a singular value decomposition of $Y$ with $U\in\mathbb{R}^{n\times r}$ and $V\in\mathbb{R}^{r\times r}$ having orthonormal columns. We assume $r<n$ . Then by $U_{\perp}\in\mathbb{R}^{n\times(n-r)}$ we denote a matrix with orthonormal columns and $U^{T}U_{\perp}=0$ . In the case $r=n$ , the terms involving $U_{\perp}$ in the following calculation are simply not present. We write

H_{1}=UA_{1}V^{T}+U_{\perp}B_{1}V^{T},\quad H_{2}=UA_{2}V^{T}+U_{\perp}B_{2}V^% {T}.

Since $H_{1},H_{2}\in\mathcal{H}_{Y}$ , we have

\Sigma A_{1}=A_{1}^{T}\Sigma,\quad\Sigma A_{2}=A_{2}^{T}\Sigma.

Then a direct calculation yields

	$\displaystyle(Y+H_{1})(Y+H_{1})^{T}-YY^{T}$	$\displaystyle=U[\Sigma A_{1}^{T}+A_{1}\Sigma+A_{1}A_{1}^{T}]U^{T}$
		$\displaystyle+U[\Sigma+A_{1}]B_{1}^{T}U_{\perp}^{T}+U_{\perp}B_{1}[\Sigma+A_{1% }^{T}]U^{T}+U_{\perp}B_{1}B_{1}^{T}U_{\perp}^{T},$

and analogously for $(Y+H_{2})(Y+H_{2})^{T}-YY^{T}$ . Since the four terms in the above sum are mutually orthogonal in the Frobenius inner product, the equality $(Y+H_{1})(Y+H_{1})^{T}=(Y+H_{2})(Y+H_{2})^{T}$ particularly implies

\Sigma A_{1}^{T}+A_{1}\Sigma+A_{1}A_{1}^{T}=\Sigma A_{2}^{T}+A_{2}\Sigma+A_{2}% A_{2}^{T},

as well as

(\Sigma+A_{1})B_{1}^{T}=(\Sigma+A_{2})B_{2}^{T}.

(3.1)

The first of these equations can be written as

\Sigma(A_{1}-A_{2})^{T}+(A_{1}-A_{2})\Sigma=A_{2}(A_{2}-A_{1})^{T}-(A_{1}-A_{2% })A_{1}^{T}.

By Proposition 3.1 (with $n=r$ , $Y=\Sigma$ and $H=A_{1}-A_{2}$ ),

\|\Sigma(A_{1}-A_{2})^{T}+(A_{1}-A_{2})\Sigma\|_{F}\geq 2\sigma_{r}(Y)\|A_{1}-% A_{2}\|_{F},

whereas

\|A_{2}(A_{2}-A_{1})^{T}-(A_{1}-A_{2})A_{1}^{T}\|_{F}\leq(\|H_{2}\|_{F}+\|H_{1% }\|_{F})\|A_{1}-A_{2}\|_{F}.

Since $\|H_{2}\|_{F}+\|H_{1}\|_{F}<2\sigma_{r}(Y)$ , this shows that we must have $A_{1}=A_{2}$ , which then by (3.1) also implies $B_{1}=B_{2}$ , since $\Sigma+A_{1}$ is invertible.

Hence, we have proven that $\phi$ is an injective map from $Y+\mathcal{B}_{Y}$ to $\mathcal{M}_{r}^{+}$ . To validate that it is a diffeomorphism onto its image we show that it is locally a diffeomorphism, for which again it suffices to confirm that $\phi^{\prime}(Y+H)$ is injective on $\mathcal{H}_{Y}$ for every $H\in\mathcal{B}_{Y}$ (since $\mathcal{H}_{Y}$ and $\mathcal{M}_{r}^{+}$ have the same dimension). It follows from Proposition 3.1 (with $Y$ replaced by $Y+H$ , which has full column rank) that the null space of $\phi^{\prime}(Y+H)$ equals $\mathcal{T}_{Y+H}$ . We claim that $\mathcal{T}_{Y+H}\cap\mathcal{H}_{Y}=\{0\}$ , which proves the injectivity of $\phi^{\prime}(Y+H)$ on $\mathcal{H}_{Y}$ . Indeed, let $K$ be an element in the intersection, i.e., $K=(Y+H)S$ for some skew-symmetric $S$ and $Y^{T}K-K^{T}Y=0$ . Inserting the first relation into the second, and using $Y^{T}H=H^{T}Y$ , yields the homogenuous Lyapunov equation

(Y^{T}Y+Y^{T}H)S+S(Y^{T}Y+Y^{T}H)=0.

(3.2)

The symmetric matrix

Y^{T}Y+Y^{T}H=\frac{1}{2}(Y+H)^{T}(Y+H)+\frac{1}{2}(Y^{T}Y-H^{T}H)

in (3.2) is positive definite, since $\lambda_{1}(H^{T}H)\leq\|H^{T}H\|_{F}<\sigma_{r}(Y)^{2}=\lambda_{r}(Y^{T}Y)$ (here $\lambda_{i}$ denotes the $i$ -th eigenvalue of the corresponding matrix). But in this case (3.2) implies $S=0$ , that is, $K=0$ . ∎

Finally, it is also possible to provide a lower bound on the radius of the largest ball around $X=YY^{T}$ such that its intersection with $\mathcal{M}_{r}^{+}$ is in the image $\phi(Y+\mathcal{B}_{Y})$ (so that an inverse map $\phi^{-1}$ is defined).

Proposition 3.3.

Any $\tilde{X}\in\mathcal{M}_{r}^{+}$ satisfying $\|\tilde{X}-X\|_{F}<\frac{2\lambda_{r}(X)}{\sqrt{r+4}+\sqrt{r}}$ is in the image $\phi(Y+\mathcal{B}_{Y})$ , that is, there exists a unique $H\in\mathcal{B}_{Y}$ such that $\tilde{X}=(Y+H)(Y+H)^{T}$ .

Observe that one could take

\|\tilde{X}-X\|_{F}\leq\frac{\lambda_{r}(X)}{\sqrt{r+4}}.

(3.3)

as a slightly cleaner sufficient condition in the proposition.

Proof.

Let $\tilde{X}=\tilde{Z}\tilde{Z}^{T}$ with $\tilde{Z}\in\mathbb{R}^{n\times r}$ and assume a polar decomposition of $Y^{T}\tilde{Z}=P\tilde{Q}^{T}$ , where $P,\tilde{Q}\in\mathbb{R}^{r\times r}$ , $P$ is positive semidefinite, and $\tilde{Q}$ is orthogonal. Let $Z=\tilde{Z}\tilde{Q}$ . Then

H=Z-Y

(3.4)

satisfies $(Y+H)(Y+H)^{T}=\tilde{X}$ , and since $Y^{T}H=P-Y^{T}Y$ is symmetric, we have $H\in\mathcal{H}_{Y}$ . We need to show $H\in\mathcal{B}_{Y}$ , that is, $\|H\|_{F}<\sigma_{r}(Y)$ . Proposition 3.2 then implies that $H$ is unique in $\mathcal{B}_{Y}$ . Let $YY^{\dagger}$ be the orthogonal projector onto the column span of $Y$ and $Z_{1}=YY^{\dagger}Z$ . With that, we have the decomposition

\|H\|_{F}^{2}=\|YY^{\dagger}H\|_{F}^{2}+\|(I-YY^{\dagger})H\|_{F}^{2}=\|Z_{1}-% Y\|_{F}^{2}+\|(I-YY^{\dagger})Z\|_{F}^{2}.

(3.5)

We estimate both terms separately. Since $Y^{T}Z_{1}=Y^{T}Z=P$ is symmetric and positive semidefinite, the first term satisfies

	$\displaystyle\\|Z_{1}-Y\\|_{F}^{2}$	$\displaystyle=\\|Z_{1}\\|_{F}^{2}-2\operatorname{trace}(Y^{T}Z_{1})+\\|Y\\|_{F}^{2}$
		$\displaystyle=\\|(Z_{1}Z_{1}^{T})^{1/2}\\|_{F}^{2}-2\sum_{i=1}^{r}\sigma_{i}(Y^{% T}Z_{1})+\\|(YY^{T})^{1/2}\\|_{F}^{2}.$		(3.6)

A simple consideration using a singular value decomposition of $Y$ and $Z_{1}$ reveals that

(YY^{T})^{1/2}(Z_{1}Z_{1}^{T})^{1/2}=\tilde{U}Y^{T}Z_{1}\tilde{V}^{T}

for some $\tilde{U}$ and $\tilde{V}$ with orthonormal columns. Consequently, by von Neumann’s trace inequality (see, e.g., [32, Theorem 7.4.1.1]), we have

\operatorname{trace}((YY^{T})^{1/2}(Z_{1}Z_{1}^{T})^{1/2})\leq\sum_{i=1}^{r}% \sigma_{i}(Y^{T}Z_{1}).

Inserting this into (3.1) yields

\|Z_{1}-Y\|_{F}^{2}\leq\|(Z_{1}Z_{1}^{T})^{1/2}-(YY^{T})^{1/2}\|_{F}^{2}.

We remark that we could have concluded this inequality from [8, Theorem 2.7] where it is also stated. It actually holds for any $Z_{1}$ for which $Y^{T}Z_{1}$ is symmetric and positive semidefinite using the same argument (in particular for $Z_{1}$ replaced with the initial $Z$ ). Let now $Y=U\Sigma V^{T}$ be a singular value decomposition of $Y$ with $\sigma_{r}(Y)$ the smallest positive singular value. Then $Z_{1}Z_{1}^{T}=US^{2}U^{T}$ for some positive semidefinite $S^{2}\in\mathbb{R}^{r\times r}$ and it follows from well-known results, (cf. [50]), that¹¹1For completeness we provide the proof. The matrix $S-\Sigma$ is the unique solution to the matrix equation $\mathcal{L}(M)=SM+M\Sigma=S^{2}-\Sigma^{2}$ . Indeed, the linear operator $\mathcal{L}$ on $\mathbb{R}^{r\times r}$ is symmetric in the Frobenius inner product and has positive eigenvalues $\lambda_{i,j}=\lambda_{i}(S)+\Sigma_{jj}\geq\sigma_{r}(Y)$ (the eigenvectors are rank-one matrices $w_{i}e_{j}^{T}$ with $w_{i}$ the eigenvectors of $S$ ). Hence $\|S^{2}-\Sigma^{2}\|_{F}=\|\mathcal{L}(S-\Sigma)\|_{F}\geq\sigma_{r}(Y)\|S-% \Sigma\|_{F}$ .

\|(Z_{1}Z_{1}^{T})^{1/2}-(YY^{T})^{1/2}\|_{F}^{2}=\|S-\Sigma\|_{F}^{2}\leq% \frac{1}{\sigma_{r}(Y)^{2}}\|S^{2}-\Sigma^{2}\|_{F}^{2}=\frac{1}{\sigma_{r}(Y)% ^{2}}\|Z_{1}Z_{1}^{T}-YY^{T}\|_{F}^{2}.

Noting that $Z_{1}Z_{1}^{T}=(YY^{\dagger})\tilde{X}(YY^{\dagger})$ and $YY^{T}=(YY^{\dagger})X(YY^{\dagger})$ we conclude the first part with

\|Z_{1}-Y\|_{F}^{2}\leq\frac{1}{\sigma_{r}(Y)^{2}}\|(YY^{\dagger})(\tilde{X}-X% )(YY^{\dagger})\|_{F}^{2}\leq\frac{1}{\sigma_{r}(Y)^{2}}\|\tilde{X}-X\|_{F}^{2}.

(3.7)

The second term in (3.5) can be estimated as follows:

$\displaystyle\\|(I-YY^{\dagger})Z\\|_{F}^{2}$	$\displaystyle=\operatorname{trace}((I-YY^{\dagger})\tilde{X}(I-YY^{\dagger}))$
	$\displaystyle\leq\sqrt{r}\\|(I-YY^{\dagger})\tilde{X}(I-YY^{\dagger})\\|_{F}$
	$\displaystyle=\sqrt{r}\\|(I-YY^{\dagger})(\tilde{X}-X)(I-YY^{\dagger})\\|_{F}% \leq\sqrt{r}\\|\tilde{X}-X\\|_{F},$	(3.8)

where we used the Cauchy Schwarz inequality and the fact that $(I-YY^{\dagger})\tilde{X}(I-YY^{\dagger})$ has rank at most $r$ .

As a result, combining (3.5) with (3.7) and (3.8), we obtain

\|H\|_{F}^{2}\leq\frac{1}{\sigma_{r}(Y)^{2}}\|\tilde{X}-X\|_{F}^{2}+\sqrt{r}\|% \tilde{X}-X\|_{F}.

(3.9)

The right side is strictly smaller than $\sigma_{r}(Y)^{2}$ when

\|\tilde{X}-X\|_{F}<-\frac{\sigma_{r}(Y)^{2}\sqrt{r}}{2}+\sqrt{\frac{\sigma_{r% }(Y)^{4}r}{4}+\sigma_{r}(Y)^{4}}=\frac{\sigma_{r}(Y)^{2}}{2}(\sqrt{r+4}-\sqrt{% r})=\frac{2\lambda_{r}(X)}{\sqrt{r+4}+\sqrt{r}},

which proves the assertion. ∎

Remark 3.4.

From definition (3.4) of $H$ , since $\tilde{Q}$ is given by the polar decomposition $Y^{T}\tilde{Z}=P\tilde{Q}^{T}$ , it follows that

\|H\|_{F}=\|Y-\tilde{Z}\tilde{Q}\|_{F}=\min_{Q\in\mathcal{O}_{r}}\|Y-\tilde{Z}% Q\|_{F},

see, e.g., [32, section 7.4.5]. In general, given any $Y,\tilde{Z}\in\mathbb{R}^{n\times r}$ , both of rank $r$ , the minimizer $Z=\tilde{Z}\tilde{Q}$ in this problem is necessarily obtained by choosing $\tilde{Q}$ from the polar decomposition of $Y^{T}\tilde{Z}$ so that $Y^{T}Z$ is necessarily symmetric, that is, $Z$ and hence $Z-Y$ are in the horizontal space $\mathcal{H}_{Y}$ . In fact, the quantity $\min_{Q\in\mathcal{O}_{r}}\|Y-\tilde{Z}Q\|_{F}$ defines a Riemannian distance between the orbits $Y\mathcal{O}_{r}$ and $\tilde{Z}\mathcal{O}_{r}$ in the corresponding quotient manifold; see [39, Proposition 5.1].

3.2 A time interval for the factorized problem

We now return to the factorized problem formulation (BM_$t$). Let $Y_{t}$ be an optimal solution of (BM_$t$) at some fixed time point $t$ (so that $Y_{t}Y_{t}^{T}=X_{t}$ and $\operatorname{rank}Y_{t}=r$ ). Based on the above propositions we are able to state a result on the allowed time interval $[t,t+\Delta t]$ for which the factorized problem (BM_$t$) is guaranteed to admit unique solutions on the horizontal space $\mathcal{H}_{Y_{t}}$ corresponding to the original problem (SDP_$t$). For this, exploiting the smoothness of the curve $t\mapsto X_{t}$ , we first define

L:=\max_{t\in[0,T]}\|\dot{X}_{t}\|_{F},

(3.10)

a uniform bound on the time derivative, as well as

\lambda_{r}(X_{t})\geq\lambda_{*}>0

(3.11)

on the smallest eigenvalue of $X_{t}$ , are available for $t\in[0,T]$ . Notice that the existence of such bounds is without any further loss of generality: the existence of $L$ follows from (C1), which guarantees that $X_{t}$ is a smooth curve, while the existence of $\lambda_{*}$ is guaranteed by (C2), since $X_{t}$ has a constant rank.

Theorem 3.5.

Let $Y_{t}$ be a solution of (BM_$t$) as above. Then for $\Delta t<\frac{2\lambda_{*}}{L(\sqrt{r+4}+\sqrt{r})}$ there is a unique and smooth solution curve $s\mapsto Y_{s}$ for the problem (BM_$t$) restricted to $\mathcal{H}_{Y_{t}}$ in the time interval $s\in[t,t+\Delta t]$ .

Proof.

It suffices to show that for $s$ in the asserted time interval the solutions $X_{s}$ of (SDP ${}_{s}$ ) lie in the image $\phi(Y_{t}+\mathcal{B}_{Y_{t}})$ . By Proposition 3.3, this is the case if $\|X_{s}-X_{t}\|_{F}<\frac{2\lambda_{r}}{\sqrt{r+4}+\sqrt{r}}$ , where $\lambda_{r}$ is the smallest eigenvalue of $X_{t}$ . Since

\|X_{s}-X_{t}\|_{F}\leq\int_{t}^{s}\|\dot{X}_{\tau}\|_{F}\;d\tau\leq L(s-t),

and $\lambda_{*}\leq\lambda_{r}$ , the condition $s-t<\frac{2\lambda_{*}}{L(\sqrt{r+4}+\sqrt{r})}$ is sufficient. Then Proposition 3.2 provides the smooth solution curve $Y_{s}=\phi^{-1}(X_{s})$ for problem (BM_$t$). ∎

The results of this section motivate the definition of a version of (BM_$t$) restricted to $\mathcal{H}_{Y_{t}}$ , which we provide in the next section.

4 Path following the trajectory of solutions

In this section, we present a path-following procedure for computing a sequence of approximate solutions $\{\hat{Y}_{0},\dots,\hat{Y}_{k},\dots,\hat{Y}_{K}\}$ at different time points that tracks a trajectory of solutions $t\mapsto Y_{t}$ to the Burer–Monteiro reformulation (BM_$t$). From this sequence we are then able to reconstruct a corresponding sequence of approximate solutions $\hat{X}_{k}=\hat{Y}_{k}\hat{Y}_{k}^{T}$ tracking the trajectory of solutions $t\mapsto X_{t}$ for the full space TV-SDP problem (SDP_$t$). The path-following method is based on iteratively solving the linearized KKT system. Given an iterate $Y_{t}$ on the path, we explained in the previous section how to eliminate the problem of nonuniqueness of the path in a small time interval $[t,t+\Delta t]$ by considering problem (BM_$t$) restricted to the horizontal space $\mathcal{H}_{Y_{t}}$ . We now need to ensure that this also guarantees that the linearized KKT system admits a unique solution. We show in Theorem 4.2 that this is indeed guaranteed under standard regularity assumptions on the original problem (SDP_$t$). This is a remarkable fact of somewhat independent interest.

4.1 Linearized KKT conditions and second-order sufficiency

Given an optimal solution $X_{t}=Y_{t}Y_{t}^{T}$ at time $t$ , we aim to find a solution $X_{t+\Delta t}=Y_{t+\Delta t}Y_{t+\Delta t}^{T}$ at time $t+\Delta t$ . By the results of the previous section, the next solution can be expressed in a unique way as

Y_{t+\Delta t}=Y_{t}+\Delta Y,

where $\Delta Y$ is in the horizontal space $\mathcal{H}_{Y_{t}}$ , provided that $\Delta t$ is small enough.

We define the following maps:

$\displaystyle f_{t+\Delta t}(Y)$	$\displaystyle\coloneqq\langle C_{t+\Delta t},YY^{T}\rangle,$	(4.1)
$\displaystyle g_{t+\Delta t}(Y)$	$\displaystyle\coloneqq\mathcal{A}_{t+\Delta t}(YY^{T})-b_{t+\Delta t},$
$\displaystyle h_{Y_{t}}(Y)$	$\displaystyle\coloneqq Y^{T}_{t}Y-Y^{T}Y_{t}.$

By definition, $\Delta Y\in\mathcal{H}_{Y_{t}}$ if and only if $h_{Y_{t}}(\Delta Y)=0$ . For symmetry reasons we use the equivalent condition $h_{Y_{t}}(Y_{t}+\Delta Y)=0$ (which reflects the fact that $Y_{t}+\mathcal{H}_{Y_{t}}$ is actually a linear space).

To find the new iterate $Y_{t+\Delta t}$ we hence consider the problem

$\displaystyle\min_{Y\in\mathbb{R}^{n\times r}}$	$\displaystyle f_{t+\Delta t}(Y)$	(BM ${}_{Y_{t},t+\Delta t}$ )
s.t.	$\displaystyle g_{t+\Delta t}(Y)=0$
$\displaystyle h_{Y_{t}}(Y)=0.$

This is a quadratically constrained quadratic problem whose Lagrangian is

\mathcal{L}_{Y_{t},t+\Delta t}(Y,\lambda,\mu):=f_{t+\Delta t}(Y)-\langle% \lambda,g_{t+\Delta t}(Y)\rangle-\langle\mu,h_{Y_{t}}(Y)\rangle

(4.2)

with multipliers $\lambda\in\mathbb{R}^{m}$ and $\mu\in\mathbb{S}^{r}_{skew}$ . The KKT conditions of problem (BM ${}_{Y_{t},t+\Delta t}$ ) are

	$\displaystyle\nabla_{Y}\mathcal{L}_{Y_{t},t+\Delta t}(Y,\lambda,\mu)=0$		(4.3)
	$\displaystyle g_{t+\Delta t}(Y)=0$
	$\displaystyle h_{Y_{t}}(Y)=0.$

Hence, (4.3) reads explicitly as

\mathcal{F}_{Y_{t},t+\Delta t}(Y,\lambda,\mu)\coloneqq\begin{bmatrix}2C_{t+% \Delta t}Y-2\mathcal{A}^{*}_{t+\Delta t}(\lambda)Y-2Y_{t}\mu\\ \mathcal{A}_{t+\Delta t}(YY^{T})-b_{t+\Delta t}\\ Y_{t}^{T}Y-Y^{T}Y_{t}\end{bmatrix}=0.

The linearization of (4.3) at $(Y_{t},\lambda_{t},\mu_{t})$ leads to a linear system

\mathcal{J}_{Y_{t},t+\Delta t}(Y_{t},\lambda_{t},\mu_{t})\begin{bmatrix}\Delta Y% \\ \lambda_{t}+\Delta\lambda\\ \mu_{t}+\Delta\mu\end{bmatrix}=\begin{bmatrix}-\nabla_{Y}f_{t+\Delta t}(Y_{t})% \\ g_{t+\Delta t}(Y_{t})\\ 0\end{bmatrix},

(4.4)

where $\mathcal{J}_{Y_{t},t+\Delta t}(Y,\lambda,\mu)$ denotes the derivative of $\mathcal{F}_{Y_{t},t+\Delta t}$ at $(Y,\lambda,\mu)$ . Note that it actually does not depend on $\mu$ , but we will keep this notation for consistency. As a linear operator on $\mathbb{R}^{n\times r}\times\mathbb{R}^{m}\times\mathbb{S}^{r}_{skew}$ , $\mathcal{J}_{Y_{t},t+\Delta t}(Y,\lambda,\mu)$ can be written in block matrix notation as follows,

\mathcal{J}_{Y_{t},t+\Delta t}(Y,\lambda,\mu):=\begin{bmatrix}\nabla^{2}_{Y}% \mathcal{L}_{Y_{t},t+\Delta t}(\lambda)&&-g^{\prime}_{t+\Delta t}(Y)^{*}&&-h_{% Y_{t}}^{*}\\ -g^{\prime}_{t+\Delta t}(Y)&&0&&0\\ -h_{Y_{t}}&&0&&0\end{bmatrix},

where from (4.1) and (4.2) one derives

	$\displaystyle\nabla^{2}_{Y}\mathcal{L}_{Y_{t},t+\Delta t}$	$\displaystyle:H\mapsto 2(C_{t+\Delta t}-\mathcal{A}^{*}_{t+\Delta t}(\lambda))H,$
	$\displaystyle g^{\prime}_{t+\Delta t}(Y)$	$\displaystyle:H\mapsto\mathcal{A}_{t+\Delta t}(YH^{T}+HY^{T}),$
	$\displaystyle h_{Y_{t}}$	$\displaystyle:H\mapsto Y_{t}^{T}H-H^{T}Y_{t},$
	$\displaystyle g^{\prime}_{t+\Delta t}(Y)^{*}$	$\displaystyle:\lambda\mapsto 2\mathcal{A}^{*}_{t+\Delta t}(\lambda)Y,$
	$\displaystyle h^{*}_{Y_{t}}$	$\displaystyle:\mu\mapsto 2Y_{t}\mu.$

For later reference, observe that as a bilinear form $\nabla^{2}_{Y}\mathcal{L}_{Y_{t},t+\Delta t}$ reads

\nabla^{2}_{Y}\mathcal{L}_{Y_{t},t+\Delta t}(\lambda)[H,H]=2\operatorname{% trace}(H^{T}(C_{t+\Delta t}-\mathcal{A}^{*}_{t+\Delta t}(\lambda))H).

Solving (4.4) for obtaining updates $(Y_{t}+\Delta Y,\lambda_{t}+\Delta\lambda,\mu_{t}+\Delta\mu)$ is equivalent to applying one step of Newton’s method to the KKT system (4.3) (Lagrange–Newton method).

Our aim in this subsection is to show that for $\Delta t$ small enough the system (4.4) is uniquely solvable when $(Y_{t},\lambda_{t})$ is a KKT-pair for the overparametrized problem (BM_$t$). Since the system is continuous in $\Delta t$ , we can do that by showing that it admits a unique solution for $\Delta t=0$ . This corresponds to proving second-order sufficient conditions for the optimality of problem (BM ${}_{Y_{t},t+\Delta t}$ ) for $\Delta t=0$ . Interestingly, it is possible to relate this to standard regularity hypotheses on the original semidefinite problem (SDP_$t$). For this we first need a uniqueness statement on the Lagrange multiplier $\lambda_{t}$ .

Lemma 4.1.

Given an optimal solution $X_{t}=Y_{t}Y_{t}^{T}$ to (SDP_$t$), suppose that $X_{t}$ is a unique (see consequence (C1)), primal nondegenerate (see Definition 2.3 and assumption (A3)) solution. Then there is a unique optimal Lagrangian multiplier $\lambda_{t}$ for (BM_$t$) independent of the choice of $Y_{t}$ in the orbit $Y_{t}\mathcal{O}_{r}$ . Moreover, $Z(\lambda_{t})=C_{t}-\mathcal{A}_{t}^{*}(\lambda_{t})$ is the unique dual solution to (D-SDP_$t$).

Proof.

We start by recalling that the optimal set for (BM_$t$) coincides with $Y_{t}\mathcal{O}_{r}$ . Since the KKT conditions for (BM_$t$) are just

\nabla_{Y}f_{t}(Y)-\nabla_{Y}\langle\lambda,g_{t}(Y)\rangle=2(C_{t}-\mathcal{A% }_{t}^{*}(\lambda))Y=0

(and $g_{t}(Y)=0$ ), the set of all optimal dual multipliers for (BM_$t$) is

\{\lambda\colon(C_{t}-\mathcal{A}_{t}^{*}(\lambda))Y_{t}Q=0,Q\in\mathcal{O}_{r% }\}=\{\lambda\colon(C_{t}-\mathcal{A}_{t}^{*}(\lambda))Y_{t}=0\}

To show that this set is a singleton, it suffices to prove that the homogeneous equation $\mathcal{A}^{*}_{t}(\lambda)Y_{t}=0$ has only the zero solution. By (2.2), primal nondegeneracy for $X_{t}$ can read as

\operatorname{im}\mathcal{A}^{*}_{t}\cap\mathcal{T}^{\perp}_{X_{t}}=\{0\},

where $\mathcal{T}^{\perp}_{X_{t}}=\{M\in\mathbb{S}^{n}\mid MX_{t}=0\}$ . Noticing that $\mathcal{A}^{*}_{t}(\lambda)Y_{t}=0$ implies $A^{*}_{t}(\lambda)\in\operatorname{im}(\mathcal{A}^{*}_{t})\cap\mathcal{T}^{% \perp}_{X_{t}}$ , we get that $\mathcal{A}^{*}_{t}(\lambda)=0$ and thus $\lambda=0$ since $\mathcal{A}_{t}^{*}$ is injective by assumption (A2). To prove the second statement, observe that by primal nondegeneracy (D-SDP_$t$) has a unique solution $Z(w_{t})$ corresponding, by assumption (A2), to a unique dual multipliers vector $w_{t}$ (see Theorem 7 in [4]). Furthermore, $Z(w_{t})$ satisfies $Z(w_{t})X_{t}=\left(C_{t}-\mathcal{A}^{*}_{t}(w_{t})\right)Y_{t}Y_{t}^{T}=0$ by (2.1). Since $Y_{t}$ has full column rank if $r$ is chosen equal to $r^{*}=\operatorname{rank}X_{t}$ , this implies that $\left(C_{t}-\mathcal{A}^{*}_{t}(w_{t})\right)Y_{t}=0$ . From the first statement it then follows that $w_{t}=\lambda_{t}$ . ∎

We can now state and prove the main result of this subsection.

Theorem 4.2.

Let $(X_{t}=Y_{t}Y_{t}^{T},Z_{t})$ be a strictly complementary (see Definition 2.2) optimal primal-dual pair of solutions to (SDP_$t$)-(D-SDP_$t$) such that $X_{t}$ is a primal nondegenerate solution. Let $\lambda_{t}$ be the unique corresponding Lagrange multiplier for (BM_$t$) according to Lemma 4.1. Then the triple $(Y_{t},\lambda_{t},\mu_{t}=0)$ is a KKT triple for (BM ${}_{Y_{t},t+\Delta t}$ ) at $\Delta t=0$ (that is, $\mathcal{F}_{Y_{t},t}(Y_{t},\lambda_{t},0)=0$ ) and fulfills the second-order sufficient conditions:

\nabla^{2}_{Y}\mathcal{L}_{Y_{t},t}(\lambda_{t})[H,H]=\operatorname{trace}(H^{% T}(C_{t}-\mathcal{A}^{*}_{t}(\lambda_{t}))H)>0

(4.5)

for all $H\in\mathbb{R}^{n\times r}\setminus\{0\}$ satisfying $\mathcal{A}_{t}(Y_{t}H^{T}+HY_{t}^{T})=0$ and $Y_{t}^{T}H-HY_{t}^{T}=0$ . In particular, $\mathcal{J}_{Y_{t},t}(Y_{t},\lambda_{t},0)$ is invertible.

Proof.

Since $(C_{t}-\mathcal{A}^{*}(\lambda_{t}))Y_{t}=Z(\lambda_{t})Y_{t}=0$ by the KKT conditions for (BM_$t$) and $h_{Y_{t}}(Y_{t})=0$ , it is obvious that $\mathcal{F}_{Y_{t},t}(Y_{t},\lambda_{t},0)=0$ . It is well-known that the linearized KKT system (4.4) admits a unique solution if (and only if) the second-order sufficient conditions (4.5) hold; see e.g., [43, Lemma 16.1]. Since $(X_{t},Z(\lambda_{t}))$ is an optimal solution for the original primal-dual pair of SDPs, and it hence satisifies the second-order necessary conditions for optimality (that is, $Z(\lambda_{t})\succeq 0$ ), (4.5) holds with “ $\geq$ ”. Assume that

\operatorname{trace}(H^{T}(C_{t}-\mathcal{A}_{t}^{*}(\lambda_{t}))H)=% \operatorname{trace}(H^{T}Z(\lambda_{t})H)=0

for some $H\in\mathbb{R}^{n\times r}$ satisfying $\mathcal{A}_{t}(Y_{t}H^{T}+HY_{t}^{T})=0$ and $Y_{t}^{T}H-H^{T}Y_{t}=0$ . Since $Z_{t}=Z(\lambda_{t})$ is positive semidefinite, the columns of $H$ must belong to the kernel of $Z(\lambda_{t})$ . By strict complementarity they hence belong to the column space of $X_{t}$ , which is equal to the column space of $Y_{t}$ . Therefore $H=Y_{t}P$ for some matrix $P\in\mathbb{R}^{r\times r}$ . Consider now the matrix

\tilde{X}=X_{t}+s(Y_{t}H^{T}+HY_{t}^{T})=Y_{t}[I_{r}+s(P^{T}+P)]Y_{t}^{T},

depending on a real parameter $s$ . Clearly, $\mathcal{A}_{t}(\tilde{X})=b_{t}$ and, for nonzero $|s|$ small enough, $\tilde{X}$ is positive semidefinite. Furthermore, for a suitable choice of the sign of $s$ , we have $\langle C_{t},\tilde{X}\rangle\leq\langle C_{t},X_{t}\rangle$ . Since $X_{t}$ is the unique solution of (SDP_$t$), this implies $\tilde{X}=X_{t}$ and thus $Y_{t}H^{T}+HY_{t}^{T}$ must be zero. Since $H\in\mathcal{H}_{Y_{t}}$ Proposition 3.1 yields $H=0$ , and this completes the proof. ∎

Corollary 4.3.

Let the assumptions of Theorem 4.2 be satisfied. Then for $\Delta t>0$ small enough (and depending on $Y_{t}$ ) system (4.4), that is, operator $\mathcal{J}_{Y_{t},t+\Delta t}(Y_{t},\lambda_{t},0)$ , is invertible.

Clearly, this is only a qualitative result. An upper bound for feasible $\Delta t$ could be expressed in terms of the spectral norm of the inverse of $\mathcal{J}_{Y_{t},t}(Y_{t},\lambda_{t},0)$ using perturbation arguments. This would require a lower bound on the absolute value of the eigenvalues of $\mathcal{J}_{Y_{t},t}(Y_{t},\lambda_{t},0)$ . In this context, we should clarify that the eigenvalues, and hence also the condition number of $\mathcal{J}_{Y_{t},t+\Delta t}(Y_{t},\lambda_{t},0)$ (for sufficiently small $\Delta t$ as above), do not depend on the particular choice of $Y_{t}$ in the orbit $Y_{t}\mathcal{O}_{r}$ . This is obviously also relevant from a practical perspective. To see this, note that as a bilinear form (on $\mathbb{R}^{n\times r}\times\mathbb{R}^{m}\times\mathbb{S}^{r}_{skew}$ ) $\mathcal{J}_{Y_{t},t+\Delta t}(Y,\lambda,\mu)$ reads

		$\displaystyle\mathcal{J}_{Y_{t},t+\Delta t}(Y,\lambda,\mu)[(H,\Delta\lambda,% \Delta\mu),(H,\Delta\lambda,\Delta\mu)]$
	$\displaystyle{}={}$	$\displaystyle\operatorname{trace}(H^{T}(C_{t+\Delta t}-\mathcal{A}^{*}_{t+% \Delta t}(\lambda))H)-2\langle\Delta\lambda,\mathcal{A}_{t+\Delta t}(YH^{T}+HY% ^{T})\rangle-2\langle\Delta\mu,Y_{t}^{T}H-H^{T}Y_{t}\rangle.$

For any fixed $Q\in\mathcal{O}_{r}$ one therefore has

		$\displaystyle\mathcal{J}_{Y_{t},t+\Delta t}(Y_{t},\lambda_{t},0)[(H,\Delta% \lambda,\Delta\mu),(H,\Delta\lambda,\Delta\mu)]$
	$\displaystyle{}={}$	$\displaystyle\mathcal{J}_{Y_{t}Q,t+\Delta t}(Y_{t}Q,\lambda_{t},0)[\mathcal{T}% _{Q}(H,\Delta\lambda,\Delta\mu),\mathcal{T}_{Q}(H,\Delta\lambda,\Delta\mu)]$

with the unitary linear operator $\mathcal{T}_{Q}(H,\Delta\lambda,\Delta\mu)=(HQ,\Delta\lambda,Q^{T}\Delta\mu Q)$ on $\mathbb{R}^{n\times r}\times\mathbb{R}^{m}\times\mathbb{S}^{r}_{skew}$ . It follows that $\mathcal{J}_{Y_{t},t+\Delta t}(Y_{t},\lambda_{t},0)$ and $\mathcal{J}_{Y_{t}Q,t+\Delta t}(Y_{t}Q,\lambda_{t},0)$ have the same eigenvalues.

However, our proof of Theorem 4.2 is by contradiction and hence does not provide an obvious lower bound on the radius of invertibility of $\mathcal{J}_{Y_{t},t}(Y_{t},\lambda_{t},0)$ . Here we do not intend to investigate this in more depth. In the error analysis conducted later we will essentially assume to have such a bound available (cf. Lemma 4.5).

4.2 A path-following predictor-corrector algorithm

We now thoroughly describe the path-following predictor-corrector algorithm that we propose for tracking the trajectory of solutions to (SDP_$t$). It includes an optional adaptive step size tuning step which is based on measuring the residual of the optimality conditions, defined as

\operatorname{res}_{t}(Y,\lambda):=\left\|\begin{array}[]{c}2[C_{t}-\mathcal{A% }^{*}_{t}(\lambda)]Y\\ \mathcal{A}_{t}(YY^{T})-b_{t}\end{array}\right\|_{\infty}.

(RES)

The residual expresses the maximal component-wise violation of the optimality KKT conditions for the problem (BM_$t$) and is therefore a suitable error measure. Indeed (see, e.g., [57, Theorems 3.1 and 3.2]), if the second-order sufficiency condition for optimality holds at $(Y_{t},\lambda_{t})$ , then there are constants $\eta,C_{1},C_{2}>0$ such that for all $(Y,\lambda)$ with $\|(Y,\lambda)-(Y_{t},\lambda_{t})\|\leq\eta$ one has

C_{1}\|(Y,\lambda)-(Y_{t},\lambda_{t})\|\leq\operatorname{res}_{t}(Y,\lambda)% \leq C_{2}\|(Y,\lambda)-(Y_{t},\lambda_{t})\|.

Here and in the following, we we use the norm $\|(Y,\lambda)\|^{2}=\|Y\|_{F}^{2}+\|\lambda\|^{2}$ .

The overall procedure is displayed as Algorithm 1 below. Given a TV-SDP of the form (SDP_$t$), parameterized over a time interval $[0,T]$ , the inputs are an approximate initial primal-dual solution pair $(\hat{X}_{0},Z(\hat{\lambda}_{0}))$ to (SDP ${}_{0}$ )–(D-SDP ${}_{0}$ ) and an initial step size $\Delta t_{0}$ . At each iteration the current iterate is used to construct the linear system (4.4), which is then solved, returning the updates $\Delta Y$ and $\Delta\lambda$ . The presented version of the algorithm also includes a procedure for tuning the step size that can be activated through the Boolean variable step size_TUNING and is supposed to ensure that the residual threshold is satisfied at every time step. Specifically, if for a time step the threshold is violated, the step size is reduced by a factor $\gamma_{1}\in(0,1)$ and a more accurate solution is obtained by solving the linearized KKT system (4.4) for the reduced time step. On the other hand, to avoid unnecessary small steps, the step size is increased after every successful step by a factor $\gamma_{2}>1$ (but is never made larger than $\Delta t_{0}$ ). If the step size tuning is deactivated, the algorithm just runs with the constant step size $\Delta t_{0}$ instead. Note that Algorithm 1 tracks both the primal solution $X_{t}$ and the dual solution $Z_{t}=C_{t}-\mathcal{A}^{*}_{t}(\lambda_{t})$ .

Algorithm 1 Path-following predictor-corrector for (SDP_$t$) with

t\in[0,T]

Input: an initial approximate primal-dual solution $(\hat{X}_{0},Z(\hat{\lambda}_{0}))$ to (SDP ${}_{0}$ )–(D-SDP ${}_{0}$ )
initial step size $\Delta t_{0}$
boolean variable step size_TUNING
step size tuning parameters $\gamma_{1}\in(0,1)$ , $\gamma_{2}>1$
residual tolerance $\epsilon>0$
Output: solutions $\{\hat{X}_{k}\}_{k=0,\dots,K}$ to (SDP_$t$) for $t\in\{0,\dots,t_{k},\dots,T\}$

k\xleftarrow{}0

t_{0}\xleftarrow{}0

\Delta t\xleftarrow{}\Delta t_{0}

S=\{\hat{X}_{0}\}

r=\operatorname{rank}(\hat{X}_{0})

5: find

\hat{Y}_{0}\in\mathbb{R}^{n\times r}

such that

\hat{Y}_{0}\hat{Y}_{0}^{T}=\hat{X}_{0}

6: while

t_{k}<T

7: solve linear system (4.4) with data

\Delta t,t_{k},\hat{Y}_{k},\hat{\lambda}_{k}

and obtain

\Delta Y,\Delta\lambda

8: if step size_TUNING and

\operatorname{res}_{\hat{Y}_{k},t_{k}+\Delta t}(\hat{Y}_{k}+\Delta Y,\hat{% \lambda}_{k}+\Delta\lambda)>\epsilon

then

\Delta t\xleftarrow{}\gamma_{1}\Delta t

10: go back to step 6

11:

(t_{k+1},\hat{Y}_{k+1},\hat{\lambda}_{k+1})\xleftarrow{}(t_{k}+\Delta t,\hat{Y% }_{k}+\Delta Y,\hat{\lambda}_{k}+\Delta\lambda)

12: append

\hat{X}_{k+1}={}\hat{Y}_{k+1}\hat{Y}_{k+1}^{T}

S

13: if step size_TUNING then

14:

\Delta t\xleftarrow{}\min(T-t_{k+1},\gamma_{2}\Delta t,\Delta t_{0})

15: else

16:

\Delta t\xleftarrow{}\min(T-t_{k+1},\Delta t)

17:

k\xleftarrow{}k+1

18: return

S

4.3 Error analysis

We investigate the algorithm without step size tuning. The main goal of the following error analysis to show that the computed $(\hat{X}_{k},\hat{\lambda}_{k})$ , where $\hat{X}_{k}=\hat{Y}_{k}\hat{Y}_{k}^{T}$ , remain close to the exact solutions $(X_{t_{k}},\lambda_{t_{k}})$ , if properly initialized. The logic of the proof is similar to standard path following methods based on Newton’s method, e.g. [22]. The specific form of our problem requires some additional considerations that allow for more precise quantitative bounds depending on the problem constants.

Throughout this section, $(X_{t}=Y_{t}Y_{t}^{T},Z_{t})$ is an optimal primal-dual pair of solutions to (SDP_$t$)–(D-SDP_$t$) satisfying the five assumptions (A1)–(A5), so that it is strictly complementary (see Definition 2.2) and such that $X_{t}$ is primal nondegenerate. Notice that the choice of factor $Y_{t}$ can be arbitrary, since it does not affect any of the subsequent statements. In Lemma 4.1 and its proof, we have seen that for every $X_{t}$ the unique Lagrange multiplier $\lambda_{t}$ satisfies $Z_{t}=C_{t}-\mathcal{A}^{*}_{t}(\lambda_{t})$ , that is,

\lambda_{t}=(A^{*}_{t})^{\dagger}(Z_{t}-C_{t})

with $(A^{*}_{t})^{\dagger}$ being the pseudo-inverse of $A^{*}_{t}$ . By assumption (A5), $C_{t}$ and $\mathcal{A}^{*}_{t}$ depend smoothly on $t$ and so does $(\mathcal{A}_{t}^{*})^{\dagger}$ , since $\mathcal{A}_{t}$ is surjective for all $t$ by assumption (A2). Also, by Theorem 2.4, $t\mapsto Z_{t}$ is smooth. Therefore the curve $t\mapsto\lambda_{t}$ is smooth. Since the algorithm operates in the $(Y,\lambda)$ space, our implicit goal is to show that the iterates stay close to the set

\mathcal{C}:=\{(Y_{t},\lambda_{t})\mid\text{$(Y_{t}Y_{t}^{T},Z(\lambda_{t}))$ % is an optimal primal-dual pair to\leavevmode\nobreak\ \eqref{eq: SDP}--\eqref{% eq: DSDP}},\ t\in[0,T]\}

containing the optimal primal-dual trajectories in the Burer–Monteiro factorization.

Lemma 4.4.

The set $\mathcal{C}$ is compact.

Proof.

As the curve $t\mapsto\lambda_{t}$ is continuous, it suffices to prove that the set $\mathcal{C}_{Y}=\{Y_{t}\mid t\in[0,T]\}$ is compact. Since $\|Y_{t}\|_{F}=\sqrt{\operatorname{trace}(X_{t})}$ and $t\mapsto X_{t}$ is smooth, it is bounded. To see that the set is closed, let $(Y_{n})\subset\mathcal{C}_{Y}$ be a convergent sequence with limit $Y$ such that $Y_{n}Y_{n}^{T}=X_{t_{n}}$ for some $t_{n}\in[0,T]$ . By passing to a subsequence, we can assume $t_{n}\to t\in[0,T]$ . Then obviously $X_{t}=YY^{T}$ , which shows that $Y$ is in the set. ∎

We consider the norm on $\mathbb{R}^{n\times r}\times\mathbb{R}^{m}\times\mathbb{S}^{r}_{skew}$ defined by $\|(Y,\lambda,\mu)\|^{2}=\|Y\|_{F}^{2}+\|\lambda\|^{2}+\|\mu\|_{F}^{2}$ . The induced operator norm is denoted $\|\cdot\|_{op}$ .

Lemma 4.5.

There exists a constant $m>0$ such that

\|\mathcal{J}_{Y_{t},t}(Y_{t},\lambda_{t},0)^{-1}\|_{op}\leq\frac{1}{m}

(4.6)

for all $(Y_{t},\lambda_{t})\in\mathcal{C}$ .

Proof.

On its open domain of definition, the map $(Y,\lambda)\mapsto\|\mathcal{J}(Y,\lambda,0)^{-1}\|_{op}$ is continuous. By Theorem 4.2, the compact set $\mathcal{C}$ is contained in that domain. Therefore, $\|\mathcal{J}(Y,\lambda,0)^{-1}\|_{op}$ achieves its maximum on $\mathcal{C}$ . ∎

Lemma 4.6.

For any $t\in[0,T]$ and $\hat{Y}\in\mathbb{R}^{n\times r}$ , the map** $(Y,\lambda,\mu)\mapsto\mathcal{J}_{\hat{Y},t}(Y,\lambda,\mu)$ is Lipschitz continuous in the operator norm on $\mathbb{R}^{n\times r}\times\mathbb{R}^{m}\times\mathbb{S}^{r}_{skew}$ . Specifically,

\|\mathcal{J}_{\hat{Y},t}(Y_{1},\lambda_{1},\mu_{1})-\mathcal{J}_{\hat{Y},t}(Y% _{2},\lambda_{2},\mu_{2})\|_{op}\leq 12\sqrt{3}\|\mathcal{A}_{t}\|\|(Y_{1},% \lambda_{1},\mu_{1})-(Y_{2},\lambda_{2},\mu_{2})\|

for all $(Y_{1},\lambda_{1},\mu_{1})$ and $(Y_{2},\lambda_{2},\mu_{2})$ , where $\|\mathcal{A}_{t}\|$ is the operator norm of $\mathcal{A}_{t}$ .

Proof.

It follows from (4.1) that as a bilinear form one has

		$\displaystyle(\mathcal{J}_{\hat{Y},t}(Y_{1},\lambda_{1},\mu_{1})-\mathcal{J}_{% \hat{Y},t}(Y_{2},\lambda_{2},\mu_{2}))[(H,\Delta\lambda,\Delta\mu),(H,\Delta% \lambda,\Delta\mu)]$
	$\displaystyle{}={}$	$\displaystyle\operatorname{trace}(H^{T}\mathcal{A}^{*}_{t}(\lambda_{2}-\lambda% _{1})H)-2(\Delta\lambda)^{T}\mathcal{A}_{t}((Y_{1}-Y_{2})H^{T}+H(Y_{1}-Y_{2})^% {T})$
	$\displaystyle{}\leq{}$	$\displaystyle\\|\mathcal{A}_{t}\\|\\|\lambda_{1}-\lambda_{2}\\|\\|H\\|_{F}^{2}+4\\|% \mathcal{A}_{t}\\|\\|Y_{1}-Y_{2}\\|_{F}\\|H\\|_{F}\\|\Delta\lambda\\|$
	$\displaystyle{}\leq{}$	$\displaystyle(\\|\mathcal{A}_{t}\\|\\|\lambda_{1}-\lambda_{2}\\|+4\\|\mathcal{A}_{t% }\\|\\|Y_{1}-Y_{2}\\|_{F})(\\|H\\|_{F}+\\|\Delta\lambda\\|+\\|\Delta\mu\\|_{F})^{2}$
	$\displaystyle{}\leq{}$	$\displaystyle 4\\|\mathcal{A}_{t}\\|(\\|Y_{1}-Y_{2}\\|_{F}+\\|\lambda_{1}-\lambda_{% 2}\\|+\\|\mu_{1}-\mu_{2}\\|)(\\|H\\|_{F}+\\|\Delta\lambda\\|+\\|\Delta\mu\\|_{F})^{2}$
	$\displaystyle{}\leq{}$	$\displaystyle 12\sqrt{3}\\|\mathcal{A}_{t}\\|\\|(Y_{1},\lambda_{1},\mu_{1})-(Y_{2% },\lambda_{2},\mu_{2})\\|\\|(H,\Delta\lambda,\Delta\mu)\\|^{2}.$

This proves the claim. ∎

Since $t\mapsto\mathcal{A}_{t}$ is assumed to be continuous, the constant $M=\max_{t\in[0,T]}12\sqrt{3}\|\mathcal{A}_{t}\|$ satisfies the uniform Lipschitz condition

\|\mathcal{J}_{\hat{Y},t}(Y_{1},\lambda_{1},\mu_{1})-\mathcal{J}_{\hat{Y},t}(Y% _{2},\lambda_{2},\mu_{2})\|_{op}\leq M\|(Y_{1},\lambda_{1},\mu_{1})-(Y_{2},% \lambda_{2},\mu_{2})\|

(4.7)

for all $(Y_{1},\lambda_{1},\mu_{1})$ and $(Y_{2},\lambda_{2},\mu_{2})$ , independent of the choice of $\hat{Y}\in\mathbb{R}^{n\times r}$ . In what follows, we proceed with using (4.7) and (4.6), without further investigating the sharpest possible bounds.

In addition, let $\lambda_{r}(X_{t})\geq\lambda_{*}>0$ be a uniform lower bound on the smallest positive eigenvalue as in (3.11). Furthermore, we now also assume a uniform upper bound

\|Y_{t}\|_{2}=\sqrt{\lambda_{1}(X_{t})}\leq\sqrt{\Lambda_{*}}.

on the spectral norm of $Y_{t}$ . Finally, let $\|\dot{X}_{t}\|_{F}\leq L$ as in (3.10) and since the curve $t\mapsto\lambda_{t}$ is smooth, the constant

K:=\max_{t\in[0,T]}\|\dot{\lambda}_{t}\|

(4.8)

is also well-defined.

With the necessary constants at hand, we are now in the position to state our main result on the error analysis. The following theorem shows that we can bound the distance between the iterates of Algorithm 1 and the set of solutions to (BM_$t$) provided the initial point is close enough to the set of initial solutions and the step size $\Delta t$ is small enough. Here we employ again the natural distance measure $\min_{Q\in\mathcal{O}_{r}}\|\hat{Y}-YQ\|_{F}$ between the orbits $\hat{Y}\mathcal{O}_{r}$ and $Y\mathcal{O}_{r}$ , cf. Remark 3.4.

Theorem 4.7.

Let $\delta>0$ and $\Delta t>0$ be small enough such that the following three conditions are satisfied:

	$\displaystyle(2\sqrt{\Lambda_{}}+\delta)\delta+L\Delta t<\frac{2\lambda_{}}{% \sqrt{r+4}+\sqrt{r}},$		(4.9)
	$\displaystyle\delta<\frac{2}{3}\frac{m}{M},$		(4.10)
	$\displaystyle\left[\frac{1}{\lambda_{}}((2\sqrt{\Lambda_{}}+\delta)\delta+L% \Delta t)^{2}+\sqrt{r}(2\sqrt{\Lambda_{*}}+\delta)\delta+L\Delta t\right]^{2}+% (\delta+K\Delta t)^{2}\leq\frac{2}{3}\frac{m}{M}\delta.$		(4.11)

Assume for the initial point $(\hat{Y}_{0},\hat{\lambda}_{0})$ that

\min_{Q\in\mathcal{O}_{r}}\|(\hat{Y}_{0},\hat{\lambda}_{0})-(Y_{0}Q,\lambda_{0% })\|\leq\delta.

(4.12)

Then Algorithm 1 is well-defined and for all $t_{k+1}=t_{k}+\Delta t$ the iterates satisfy

\min_{Q\in\mathcal{O}_{r}}\|(\hat{Y}_{k},\hat{\lambda}_{k})-(Y_{t_{k}}Q,% \lambda_{t_{k}})\|\leq\delta.

It then holds that

\|\hat{X}_{k}-X_{t_{k}}\|_{F}\leq(2\sqrt{\Lambda_{*}}+\delta)\delta

for all $t_{k}$ .

Notice that the left side of (4.11) is $O(\delta^{2}+\Delta t^{2})$ for $\delta,\Delta t\to 0$ , whereas the right side is only $O(\delta)$ . Therefore for $\delta$ and $\Delta t$ small enough, (4.11) will be satisfied. Furthermore, a sufficient condition for (4.12) to hold is that

\|\hat{\lambda}_{0}-\lambda_{0}\|\leq\frac{\delta}{\sqrt{2}}

and

\|\hat{X}_{0}-X_{0}\|_{F}\leq\frac{\sqrt{r\lambda_{*}^{2}+2\sqrt{2}\delta% \lambda_{*}}-\sqrt{r\lambda_{*}^{2}}}{2},

which easily follows from (3.9).

Proof.

We will investigate one step of the algorithm and apply an induction hypothesis that at time point $t=t_{k}$ there exists $(Y_{t},\lambda_{t})\in\mathcal{C}$ satisfying

\|(\hat{Y}_{t},\hat{\lambda}_{t})-(Y_{t},\lambda_{t})\|_{F}\leq\delta.

We aim to show that for sufficiently small $\delta>0$ and $\Delta t>0$ the next iterate $(\hat{Y}_{t+\Delta t},\hat{\lambda}_{t+\Delta t})$ in the algorithm is well-defined and satisfies the same estimate

\|(\hat{Y}_{t+\Delta t},\hat{\lambda}_{t+\Delta t})-(Y_{t+\Delta t},\lambda_{t% +\Delta t})\|_{F}\leq\delta

with an exact solution $(Y_{t+\Delta t},\lambda_{t+\Delta t})\in\mathcal{C}$ . The proof of the theorem then follows by induction over the steps in the algorithm.

We first claim that there exists an exact solution $Y_{t+\Delta t}$ in the horizontal space of $\hat{Y}_{t}$ , that is, $X_{t+\Delta t}=Y_{t+\Delta t}Y_{t+\Delta t}^{T}$ and $h_{\hat{Y}_{t}}(Y_{t+\Delta t})=0$ . Indeed, using (4.9) we have

	$\displaystyle\\|\hat{X}_{t}-X_{t}\\|_{F}$	$\displaystyle=\\|(\hat{Y}_{t}-Y_{t})\hat{Y}^{T}_{t}+Y_{t}(\hat{Y}_{t}-Y_{t})^{T% }\\|_{F}$
		$\displaystyle\leq(\\|Y_{t}\\|_{2}+\\|\hat{Y}_{t}\\|_{2})\delta\leq(2\sqrt{\Lambda_% {}}+\delta)\delta<\frac{2\lambda_{}}{\sqrt{r+4}+\sqrt{r}}-L\Delta t.$

This yields

\|\hat{X}_{t}-X_{t+\Delta t}\|_{F}\leq\|\hat{X}_{t}-X_{t}\|_{F}+\|X_{t}-X_{t+% \Delta t}\|_{F}<\frac{2\lambda_{*}}{\sqrt{r+4}+\sqrt{r}}.

Thus, Proposition 3.3 states the existence of $Y_{t+\Delta t}$ as desired. We note for later use that by (3.9) it satisfies

$\displaystyle\\|\hat{Y}_{t}-Y_{t+\Delta t}\\|_{F}$	$\displaystyle\leq\frac{1}{\lambda_{*}}\\|\hat{X}_{t}-X_{t+\Delta t}\\|_{F}^{2}+% \sqrt{r}\\|\hat{X}_{t}-X_{t+\Delta t}\\|_{F}$
	$\displaystyle\leq\frac{1}{\lambda_{*}}(\\|\hat{X}_{t}-X_{t}\\|_{F}+L\Delta t)^{2% }+\sqrt{r}\\|\hat{X}_{t}-X_{t}\\|_{F}+L\Delta t$	(4.13)
	$\displaystyle\leq\frac{1}{\lambda_{}}((2\sqrt{\Lambda_{}}+\delta)\delta+L% \Delta t)^{2}+\sqrt{r}(2\sqrt{\Lambda_{*}}+\delta)\delta+L\Delta t.$

The matrix $Y_{t+\Delta t}$ is an exact solution of (BM ${}_{t+\Delta t}$ ), and by Theorem 4.2 there is a unique Lagrange multiplier $\lambda_{t+\Delta t}$ such that $\mathcal{F}_{\hat{Y}_{t},t+\Delta t}(Y_{t+\Delta t},\lambda_{t+\Delta t},0)=0$ . By construction, the next iterate $(\hat{Y}_{t+\Delta t},\hat{\lambda}_{t+\Delta t},\hat{\mu}_{t+\Delta t})$ in the algorithm is obtained from one step of the Newton method for solving this equation with starting point $(\hat{Y}_{t},\hat{\lambda}_{t},0)$ . In light of (4.6) and (4.7), standard results (e.g. Theorem 1.2.5 in [42]) on the Newton method yield that under the condition

\|(\hat{Y}_{t},\hat{\lambda}_{t},0)-(Y_{t+\Delta t},\lambda_{t+\Delta t},0)\|_% {F}\leq\varepsilon<\frac{2}{3}\frac{m}{M}

one step of the method is well-defined, i.e., $\mathcal{J}_{\hat{Y}_{t},t+\Delta t}(\hat{Y}_{t},\hat{\lambda}_{t},0)$ is invertible, and satisfies

\|(\hat{Y}_{t+\Delta t},\hat{\lambda}_{t+\Delta t},\hat{\mu}_{t+\Delta t})-(Y_% {t+\Delta t},\lambda_{t+\Delta t},0)\|_{F}\leq\frac{3}{2}\frac{M}{m}\|(\hat{Y}% _{t},\hat{\lambda}_{t},0)-(Y_{t+\Delta t},\lambda_{t+\Delta t},0)\|_{F}^{2}.

In particular, using $\varepsilon=\left(\frac{2}{3}\frac{m}{M}\delta\right)^{1/2}$ would give the desired result

\|(\hat{Y}_{t+\Delta t},\hat{\lambda}_{t+\Delta t})-(Y_{t+\Delta t},\lambda_{t% +\Delta t})\|_{F}\leq\frac{3}{2}\frac{M}{m}\varepsilon^{2}=\delta.

Therefore, we need to ensure that

\|(\hat{Y}_{t},\hat{\lambda}_{t})-(Y_{t+\Delta t},\lambda_{t+\Delta t})\|_{F}% \leq\left(\frac{2}{3}\frac{m}{M}\delta\right)^{1/2}<\frac{2}{3}\frac{m}{M}

is satisfied. Here the second inequality is just condition (4.10). We now show that (4.11) is a sufficient condition for the first inequality. Clearly, using (4.8), we have

\|\hat{\lambda}_{t}-\lambda_{t+\Delta t}\|^{2}\leq(\|\hat{\lambda}_{t}-\lambda% _{t}\|+K\Delta t)^{2}\leq(\delta+K\Delta t)^{2}.

Together with (4.13) this gives

\|\hat{Y}_{t}-Y_{t+\Delta t}\|_{F}^{2}+\|\hat{\lambda}_{t}-\lambda_{t+\Delta t% }\|^{2}\\ \leq\left[\frac{1}{\lambda_{*}}[(2\sqrt{\Lambda_{*}}\!+\!\delta)\delta\!+\!L% \Delta t]^{2}\!+\!\sqrt{r}(2\sqrt{\Lambda_{*}}\!+\!\delta)\delta\!+\!L\Delta t% \right]^{2}+(\delta+K\Delta t)^{2}.

Now (4.11) ensures the desired estimate for the right-hand side and the proof is completed. ∎

5 Numerical experiments on Time-Varying Max Cut

In this section, we compare the tracking of the trajectory of solutions to TV-SDP via Algorithm 1 with interior-point methods (IPMs) used to track the same trajectory by solving the problem at discrete time points. In our experiments, we used the implementation of the homogeneous and self-dual algorithm [6, 24] from the MOSEK Optimization Suite, version 9.3 [40]. Furthermore, in order to provide a comparison with an alternative warm-start approach, we performed numerical experiments using the Splitting Conic Solver (SCS), version 3.2.2 [46]. This package implements the first-order method presented in [44, 45], which uses an operator splitting method, the alternating directions method of multipliers, to solve the homogeneous self-dual embedding. We show the algorithm proposed in this paper can perform better, in terms of both accuracy and runtime, than repeated runs of IPM for time-invariant SDP and than the warm-started SCS.

Given a weighted graph $\mathcal{G}=(V,E)$ , the Max-Cut problem is a well-known problem in graph theory. There, we wish to find a binary partition of the vertices in $V$ (also known as a cut) of maximal weight. The weight of the cut is defined as the sum of the weights of the edges in $E$ connecting the two subsets of the partition. This problem can be formulated as the following quadratically-constrained quadratic problem

		$\displaystyle\max_{x\in\mathbb{R}^{n}}$		$\displaystyle\sum_{i,j=1}^{n}w_{i,j}(1-x_{i}x_{j})$		(MC)
		s.t.		$\displaystyle x_{i}^{2}=1\quad\text{for all }i\in\{1,\dots,n\},$		(MC)

where $n=|V|$ is the number of vertices of the graph, $w_{i,j}$ is the weight of the edge connecting vertices $i$ and $j$ , and variable $x_{i}\in\{1,-1\}$ takes binary values according to the subset to which vertex $i$ is assigned. This problem can be relaxed to an SDP of the form

$\displaystyle\min_{X\in\mathbb{S}^{n}}$	$\displaystyle\langle W,X\rangle$	(MCR)
s.t.	$\displaystyle X_{i,i}=1\quad\text{for all }i\in\{1,\dots,n\}$
$\displaystyle X\succeq 0,$

where $W$ is the weights matrix whose entry $(i,j)$ is given by $w_{i,j}$ , see [25]. Note that the number of constraints is equal to the size of the variable matrix. Randomized approximation algorithms for (MC) exploiting the convex relaxation (MCR) deliver solutions with a performance ratio of $0.87$ and are known to be the best poly-time algorithms to approximately solve (MC).

In this paper, we adopt a time-varying version of (MCR) as a benchmark, where the data matrix $W$ depends on a time parameter $t\in[0,1]\mapsto W_{t}\in\mathbb{S}^{n}$ . (We point out that this differs from the recently studied variant [37, 31] with edge insertions and deletions, which could be seen as discontinuous functions of time.)

In our experiment, $W_{t}$ is obtained as a random linear perturbation of a sparse weight matrix with density $50\%$ . Specifically,

W_{t}=W_{0}+tW_{1},

where the entries of $W_{0}$ are randomly generated with a normal distribution having mean and standard deviation $\mu,\sigma=10$ , while the entries in $W_{1}$ are chosen with a normal distribution having $\mu,\sigma=1$ . Both matrices have the same sparsity structure. We refer to such a problem as the time-varying max-cut relaxation (TV-MCR), which can be thought of as a convex relaxation for a max-cut problem where the edges weights of a given graph change over time.

All the experiments were conducted on a personal computer with a 1,6 GHz Intel Core i5 dual-core processor with 16GB RAM, using a Python implementation of our path-following algorithm. The main goal was to illustrate the potential computational benefits of our algorithm, so we did not attempt to provide the most efficient implementation. The code²²2https://github.com/antoniobellon/burer-monteiro-path-following, Eclipse Public License 2.0. as well as the data and experimental results³³3https://zenodo.org/record/7769225 are available online.

We performed experiments on $110$ instances of the TV-MCR problem with $n=100$ vertices and tracked the trajectory of solutions for $t\in[0,1]$ . Among these samples, we included 10 instances of TV-MCR for which the rank of the solution is not constant, hence violating our assumption (A4). This was done by sampling the rank (estimated with a tolerance on zero eigenvalues of $10^{-7}$ ) of the solutions obtained using MOSEK over a 10-steps subdivision of the interval $[0,1]$ and selecting ten cases in which we observed a change in the rank. Using the same procedure, we checked that for the remaining 100 instances, the rank of the solution is constant along the trajectory.

First, we applied Algorithm 1 without step size adjustment, hence setting step size_TUNING to FALSE, and using step sizes $\Delta t=0.1,0.01,0.001$ , so that in each experiment 10, 100, and 1000 iterations are performed for each choice of the step size (see Figures 1 and 2). The factor dimension $r$ is chosen equal to the rank of an initial solution obtained using MOSEK with relative gap termination tolerances set to $10^{-14}$ . Its distribution is shown in Table 1.

$r$	4	5	6	7
# occurences	2	39	53	6

Table 1: Distribution of the rank over

100

instances of the TV-MCR with

n=100

with constant rank solution trajectory.

Refer to caption — Figure 1: Distribution of the average residuals as a function of the step size using three different methods: an interior point method (IPM), in bordeaux, the splitting conic solver (SCS), in orange, and our path following (PF) algorithm, in green. The data in both plots are the same except that the left plot also shows ten rank changing instances, depicted by light green dots, which were removed in the right plot.

Figure 1 depicts the distribution over 100 instances of the average residuals along the tracking of the solution on the time interval $[0,1]$ , as a function of the used step size. For each whisker plot, the error bars span the interval from the minimum to the maximum, while the box spans the first quartile to the third quartile, with a horizontal line at the median.

In the left plot, the light green dots correspond to the average residuals of the 10 rank-changing instances; instead, the right plot excludes these degenerate instances form the data set. Notice that these points correspond to TV-SDP instances that do not satisfy our assumption (A4). The green plot shows the average residual obtained by tracking the solution with Algorithm 1, the orange plot shows the average residual when the tracking is done using SCS with relative and absolute feasibility tolerances set to $10^{-7}$ , warm-started with the current solution; finally, the bordeaux color plot shows the average residual when the tracking is done using MOSEK IPM [40] with the relative gap termination tolerances set to $10^{-15}$ .

The residual of an SDP primal-dual solution $(X,Z(\lambda))$ is defined, in analogy to (RES), as

\operatorname{res}_{t}(X,\lambda):=\left\|\begin{array}[]{c}2[C_{t}-\mathcal{A% }^{*}_{t}(\lambda)]X\\ \mathcal{A}_{t}(X)-b_{t}\end{array}\right\|_{\infty}.

By choosing a suitable step size (in our experiments order $10^{-2}$ ), Algorithm 1 yields an average residual accuracy that is comparable to the one obtained using standard IPMs with very small relative gap termination tolerance. For a step size of order $10^{-3}$ , our algorithm exhibits a residual precision that is 100 times more accurate than both IPM and warm-started SCS. Furthermore, as we see next, this accuracy is reached much faster with our approach.

In Figure 2 we plot the distributions of the runtimes of Algorithm 1 (green) as a function of the step size, as well as the distributions of the runtimes of IPM (bordeaux) used with relative gap termination tolerances $10^{-15}$ and of the warm-started SCS (orange) to track the solutions trajectory at a constant step size resolution.

Remarkably, for each step size that we tested, the mean runtime of Algorithm 1 is on average about ten times smaller then both SCS and MOSEK IPM, indicating competitive computational performances of our algorithm.

Finally, we apply Algorithm 1 to the same set of TV-MCR problems allowing for a step size adjustment (setting step size_TUNING to TRUE). In order to provide a fair comparison with MOSEK IPM, we fixed five subdivisions of the interval $[0,1]$ in a grid of, respectively, 20, 40, 60, 80, and 100 equidistant points. For each grid, at each time point, we used MOSEK with a relative gap termination tolerance of $10^{-14}$ to obtain the corresponding TV-SDP solution, recording the runtime and the average residual over the tracking of each instance. For each grid, we then run our algorithm with step size adjustment in order to ensure the same average residual accuracy guaranteed by MOSEK, additionally enforcing the path-following procedure to hit the grid points. In this way, we ensure that our procedure has the same accuracy of MOSEK both in terms of the solution residual and of the tracking resolution.

Figure 3 shows the distributions of the runtimes as a function of the number of grid points of both Algorithm 1 (green) and IPM with two different relative gap termination tolerances: $10^{-9}$ (Figure 3(a)) and $10^{-15}$ (Figure 3(b)).

Encouragingly, we observe that we can ensure both the same accuracy and tracking resolution of MOSEK at a smaller average runtime. The constant behavior of the green plot on the right is due to the fact that, in order to ensure the same residual accuracy of the IPM, the path-following procedure needs to consider a number of points that are quite denser then the number of grid points, and hence independent from this latter, while for the plot on the left it is instead sufficient for Algorithm 1 to follow the grid.

6 Conclusion

In this paper, we proposed an algorithm for solving time-varying SDPs based on a path-following predictor-corrector scheme for the Burer–Monteiro factorization. The restriction to a horizontal space ensures that the linearized KKT conditions system is uniquely solvable under standard regularity assumptions on the TV-SDP problem, thus leading to a well-defined path-following procedure with rigorous error bounds on the distance from the optimal trajectory. Preliminary numerical experiments on a time-varying version of the max-cut SDP relaxation suggest that our algorithm is competitive both in terms of runtime and accuracy when compared to the application of standard IPMs. Future work should explore the applicability and relative merits of our approach in further applications.

So far we have assumed that the rank $r$ of the true solution curve is known and remains constant. While this is certainly appropriate for a rigorous analysis as conducted in this work, it might be restrictive in practice. An important extension hence would be to develop rank-adaptive versions of our path-following approach that are able to detect and adjust the appropriate rank in a Burer–Monteiro factorization, for example, by monitoring the smallest singular values of the matrices $Y_{t}$ .

Another important aspect is the initialization of the method, which requires an accurate SDP solution and is currently not based on Burer–Monteiro factorization, thus undermining the computational efficiency of the whole approach. The obvious way out is to also solve the initial time problem using the factorized approach [16]. The metaalgorithm presented in [36] even does this in a rank-adaptive way. Although this is a nonconvex problem, several works, including also [13, 48, 19], have considered Burer–Monteiro schemes with guaranteed and certifiable convergence to a globally optimal low-rank factor under mild conditions, making this a reliable approach in practice.

Acknowledgments

The research leading to these results received funding from the OP RDE under Grant Agreement CZ.02.1.01/0.0/0.0/16_019/0000765. The first author gratefully acknowledges the support of the Czech Science Foundation (grant 22-15524S). The authors also thank two anonymous referees for their helpful comments.

References

[1] S. Aaronson, X. Chen, E. Hazan, S. Kale, and A. Nayak. Online learning of quantum states. J. Stat. Mech. Theory Exp., 2019, pages 124019, 14, 2019.
[2] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization algorithms on matrix manifolds. Princeton University Press, Princeton, NJ, 2008.
[3] A. A. Ahmadi and B. El Khadir. Time-varying semidefinite programs. Math. Oper. Res., 46(3):1054–1080, 2021.
[4] F. Alizadeh, J.-P. A. Haeberly, and M. L. Overton. Complementarity and nondegeneracy in semidefinite programming. Math. Program., 77(2, Ser. B):111–128, 1997.
[5] E. L. Allgower and K. Georg. Introduction to numerical continuation methods. SIAM, Philadelphia, 2003.
[6] E. D. Andersen, C. Roos, and T. Terlaky. On implementing a primal-dual interior-point method for conic quadratic optimization. Math. Program., 95:249–277, 2003.
[7] E. J. Anderson. A Continuous Model For Job-Shop Scheduling. PhD thesis, University of Cambridge, Cambridge, 1978.
[8] R. Balan and C. B. Dock. Lipschitz analysis of generalized phase retrievable matrix frames. SIAM J. Matrix Anal. Appl., 43(3):1518–1571, 2022.
[9] A. Barvinok. Problems of distance geometry and convex properties of quadratic maps. Discrete Comput. Geom., 13(2):189–202, 1995.
[10] A. Barvinok. A remark on the rank of positive semidefinite matrices subject to affine constraints. Discrete Comput. Geom., 25(1):23–31, 2001.
[11] R. Bellman. Bottleneck problems and dynamic programming. Proc. Nat. Acad. Sci. USA, 39:947–951, 1953.
[12] A. Bellon, D. Henrion, V. Kungurtsev, and J. Mareček. Time-varying semidefinite programming: Geometry of the trajectory of solutions. arXiv:2104.05445, 2021.
[13] N. Boumal. A Riemannian low-rank method for optimization over semidefinite matrices with block-diagonal constraints. arXiv:1506.00575, 2015.
[14] N. Boumal, V. Voroninski, and A. Bandeira. The non-convex Burer-Monteiro approach works on smooth semidefinite programs. In D. Lee et al., editor, Advances in Neural Information Processing Systems, volume 29, pages 2757–2765. Curran Associates, Inc., 2016.
[15] N. Boumal, V. Voroninski, and A. S. Bandeira. Deterministic guarantees for Burer-Monteiro factorizations of smooth semidefinite programs. Comm. Pure Appl. Math., 73(3):581–608, 2020.
[16] S. Burer and R. D. C. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program., 95(2, Ser. B):329–357, 2003.
[17] S. Burer and R. D. C. Monteiro. Local minima and convergence in low-rank semidefinite programming. Math. Program., 103(3, Ser. A):427–444, 2005.
[18] D. Cifuentes. On the Burer-Monteiro method for general semidefinite programs. Optim. Lett., 15(6):2299–2309, 2021.
[19] D. Cifuentes and A. Moitra. Polynomial time guarantees for the Burer-Monteiro method. In S. Koyejo et al., editor, Advances in Neural Information Processing Systems, volume 35, pages 23923–23935. Curran Associates, Red Hook, NY, 2022.
[20] M. Colombo, J. Gondzio, and A. Grothey. A warm-start approach for large-scale stochastic linear programs. Math. Program., 127(2, Ser. A):371–397, 2011.
[21] G. B. Dantzig. Large-scale systems optimizations with application to energy. Technical report SOL 77-3, 4 1977.
[22] Q. T. Dinh, C. Savorgnan, and M. Diehl. Adjoint-based predictor-corrector sequential convex programming for parametric nonlinear optimization. SIAM J. Optim., 22(4):1258–1284, 2012.
[23] A. Engau, M. F. Anjos, and A. Vannelli. On interior-point warmstarts for linear and combinatorial optimization. SIAM J. Optim., 20(4):1828–1861, 2010.
[24] R. M. Freund. On the behavior of the homogeneous self-dual model for conic convex optimization. Math. Program., 106(3, Ser. A):527–545, 2006.
[25] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM, 42(6):1115–1145, 1995.
[26] D. Goldfarb and K. Scheinberg. On parametric semidefinite programming. In Proceedings of the Stieltjes Workshop on High Performance Optimization Techniques (HPOPT ’96), pages 361–377. Elsevier, Amsterdam, Appl. Numer. Math. 29, 1999.
[27] J. Gondzio and A. Grothey. Reoptimization with the primal-dual interior point method. SIAM J. Optim., 13(3):842–864, 2002.
[28] J. Gondzio and A. Grothey. A new unblocking technique to warmstart interior point methods based on sensitivity analysis. SIAM J. Optim., 19(3):1184–1210, 2008.
[29] J. Guddat, F. Guerra Vazquez, and H. T. Jongen. Parametric optimization: singularities, pathfollowing and jumps. B. G. Teubner, Stuttgart; John Wiley & Sons, Ltd., Chichester, 1990.
[30] J. D. Hauenstein, A. Mohammad-Nezhad, T. Tang, and T. Terlaky. On computing the nonlinearity interval in parametric semidefinite optimization. Math. Oper. Res., 47(4):2989–3009, 2022.
[31] M. Henzinger, A. Noe, and C. Schulz. Practical fully dynamic minimum cut algorithms. In 2022 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), SIAM, Phildelphia, pages 13–26, 2022.
[32] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge University Press, Cambridge, second edition, 2013.
[33] J. Im and H. Wolkowicz. A strengthened Barvinok-Pataki bound on SDP rank. Oper. Res. Lett., 49(6):837–841, 2021.
[34] F. Jarre. An interior-point method for minimizing the maximum eigenvalue of a linear combination of matrices. SIAM J. Control Optim., 31(5):1360–1377, 1993.
[35] H. Jiang, T. Kathuria, Y. T. Lee, S. Padmanabhan, and Z. Song. A faster interior point method for semidefinite programming. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science, pages 910–918. IEEE Computer Society, Los Alamitos, CA, 2020.
[36] M. Journée, F. Bach, P.-A. Absil, and R. Sepulchre. Low-rank optimization on the cone of positive semidefinite matrices. SIAM J. Optim., 20(5):2327–2351, 2010.
[37] E. Kao, V. Gadepally, M. Hurley, M. Jones, J. Kepner, S. Mohindra, P. Monticciolo, A. Reuther, S. Samsi, W. Song, D. Staheli, and S. Smith. Streaming graph challenge: Stochastic block partition. In 2017 IEEE High Performance Extreme Computing Conference (HPEC), IEEE, Piscataway, NJ, pages 1–12, 2017.
[38] J. Lavaei and S. H. Low. Zero duality gap in optimal power flow problem. IEEE Transactions on Power Systems, 27(1):92–107, 2012.
[39] E. Massart and P.-A. Absil. Quotient geometry with simple geodesics for the manifold of fixed-rank positive-semidefinite matrices. SIAM J. Matrix Anal. Appl., 41(1):171–198, 2020.
[40] MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual. Version 9.3., 2019.
[41] Y. Nazarathy and G. Weiss. Near optimal control of queueing networks over a finite time horizon. Ann. Oper. Res., 170:233–249, 2009.
[42] Y. Nesterov. Lectures on convex optimization. Springer, Cham, Switzerland, 2018.
[43] J. Nocedal and S. Wright. Numerical optimization. Springer, New York, second edition, 2006.
[44] B. O’Donoghue. Operator splitting for a homogeneous embedding of the linear complementarity problem. SIAM J. Optim., 31(3):1999–2023, 2021.
[45] B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd. Conic optimization via operator splitting and homogeneous self-dual embedding. J. Optim. Theory Appl., 169(3):1042–1068, 2016.
[46] B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd. SCS: Splitting Conic Solver, version 3.2.2. https://github.com/cvxgrp/scs, Nov. 2022.
[47] G. Pataki. On the rank of extreme matrices in semidefinite programs and the multiplicity of optimal eigenvalues. Math. Oper. Res., 23(2):339–358, 1998.
[48] D. M. Rosen. Scalable low-rank semidefinite programming for certifiably correct machine perception. In Algorithmic Foundations of Robotics XIV, pages 551–566. Springer, Cham, Switzerland, 2021.
[49] D. M. Rosen, L. Carlone, A. S. Bandeira, and J. J. Leonard. A certifiably correct algorithm for synchronization over the special Euclidean group, pages 64–79. in Algorithmic Foundations of Robotics XII, Springer, Cham, Switzerland, 2020.
[50] B. A. Schmitt. Perturbation bounds for matrix square roots and Pythagorean sums. Linear Algebra Appl., 174:215–227, 1992.
[51] A. Skajaa, E. D. Andersen, and Y. Ye. Warmstarting the homogeneous and self-dual interior point method for linear and conic quadratic problems. Math. Program. Comput., 5(1):1–25, 2013.
[52] F. Teren. Minimum time acceleration of aircraft turbofan engines by using an algorithm based on nonlinear programming. In NASA Technical Memorandum TM-73741, Lewis Research Center, Cleveland, Ohio, September, 1977.
[53] L. Tunçel. Potential reduction and primal-dual methods. In Handbook of semidefinite programming, volume 27 of Internat. Ser. Oper. Res. Management Sci., pages 235–265. Kluwer Acad., Boston, MA, 2000.
[54] I. Waldspurger and A. Waters. Rank optimality for the Burer-Monteiro factorization. SIAM J. Optim., 30(3):2577–2602, 2020.
[55] X. Wang, S. Zhang, and D. D. Yao. Separated continuous conic programming: strong duality and an approximation algorithm. SIAM J. Control Optim., 48(4):2118–2138, 2009.
[56] H. Wolkowicz, R. Saigal, and L. Vandenberghe, editors. Handbook of semidefinite programming. Kluwer Academic, Boston, MA, 2000.
[57] S. J. Wright. An algorithm for degenerate nonlinear programming with rapid local convergence. SIAM J. Optim., 15(3):673–696, 2005.

	$\displaystyle\\|Z_{1}-Y\\|_{F}^{2}$	$\displaystyle=\\|Z_{1}\\|_{F}^{2}-2\operatorname{trace}(Y^{T}Z_{1})+\\|Y\\|_{F}^{2}$
		$\displaystyle=\\|(Z_{1}Z_{1}^{T})^{1/2}\\|_{F}^{2}-2\sum_{i=1}^{r}\sigma_{i}(Y^{% T}Z_{1})+\\|(YY^{T})^{1/2}\\|_{F}^{2}.$		(3.6)

$\displaystyle\\|(I-YY^{\dagger})Z\\|_{F}^{2}$	$\displaystyle=\operatorname{trace}((I-YY^{\dagger})\tilde{X}(I-YY^{\dagger}))$
	$\displaystyle\leq\sqrt{r}\\|(I-YY^{\dagger})\tilde{X}(I-YY^{\dagger})\\|_{F}$
	$\displaystyle=\sqrt{r}\\|(I-YY^{\dagger})(\tilde{X}-X)(I-YY^{\dagger})\\|_{F}% \leq\sqrt{r}\\|\tilde{X}-X\\|_{F},$	(3.8)

$\displaystyle\\|\hat{Y}_{t}-Y_{t+\Delta t}\\|_{F}$	$\displaystyle\leq\frac{1}{\lambda_{*}}\\|\hat{X}_{t}-X_{t+\Delta t}\\|_{F}^{2}+% \sqrt{r}\\|\hat{X}_{t}-X_{t+\Delta t}\\|_{F}$
	$\displaystyle\leq\frac{1}{\lambda_{*}}(\\|\hat{X}_{t}-X_{t}\\|_{F}+L\Delta t)^{2% }+\sqrt{r}\\|\hat{X}_{t}-X_{t}\\|_{F}+L\Delta t$	(4.13)
	$\displaystyle\leq\frac{1}{\lambda_{}}((2\sqrt{\Lambda_{}}+\delta)\delta+L% \Delta t)^{2}+\sqrt{r}(2\sqrt{\Lambda_{*}}+\delta)\delta+L\Delta t.$

Time -Varying Semidefinite Programming: Path Following a Burer–Monteiro Factorization

Abstract

1 Introduction

2 Preliminaries and key assumptions

Definition 2.1 (strict feasibility).

Definition 2.2 (strict complementarity).

Definition 2.3 (nondegeneracy).

Theorem 2.4.

3 Quotient geometry of positive semidefinite rank-𝒓𝒓\bm{r}bold_italic_r matrices

3.1 Horizontal space and unique factorizations

Proposition 3.1.

Proof.

Proposition 3.2.

Proof.

Proposition 3.3.

Proof.

Remark 3.4.

3.2 A time interval for the factorized problem

Theorem 3.5.

Proof.

4 Path following the trajectory of solutions

4.1 Linearized KKT conditions and second-order sufficiency

Lemma 4.1.

Proof.

Theorem 4.2.

Proof.

Corollary 4.3.

4.2 A path-following predictor-corrector algorithm

4.3 Error analysis

Lemma 4.4.

Proof.

Lemma 4.5.

Proof.

Lemma 4.6.

Proof.

Theorem 4.7.

Proof.

5 Numerical experiments on Time-Varying Max Cut

6 Conclusion

Acknowledgments

References

Time -Varying Semidefinite Programming:
Path Following a Burer–Monteiro Factorization

3 Quotient geometry of positive semidefinite rank- $\bm{r}$ matrices