Time -Varying Semidefinite Programming:
Path Following a Burer–Monteiro Factorization
Abstract
We present an online algorithm for time-varying semidefinite programs (TV-SDPs), based on the tracking of the solution trajectory of a low-rank matrix factorization, also known as the Burer–Monteiro factorization, in a path-following procedure. There, a predictor-corrector algorithm solves a sequence of linearized systems. This requires the introduction of a horizontal space constraint to ensure the local injectivity of the low-rank factorization. The method produces a sequence of approximate solutions for the original TV-SDP problem, for which we show that they stay close to the optimal solution path if properly initialized. Numerical experiments for a time-varying max-cut SDP relaxation demonstrate the computational advantages of the proposed method for tracking TV-SDPs in terms of runtime compared to off-the-shelf interior point methods.
Key words. Semidefinite programming; nonlinear programming; parametric optimization;
time-varying constrained optimization; Newton type methods
MSC codes. 49M15, 90C22, 90C30, 90C31
1 Introduction
Semidefinite programs (SDPs) constitute an important class of convex constrained optimization problems that is ubiquitous in statistics, signal processing, control systems, and other areas. In several applications, the data of the problem vary over time, so that this can be modeled as a time-varying SDP (TV-SDP). In this paper we consider TV-SDPs of the form
(SDP) | ||||||
s.t. | ||||||
where is a time parameter varying on a bounded interval. Here denotes the space of real symmetric matrices, is a linear operator defined by for some , , and . Throughout the paper denotes the Frobenius inner product and the constraint requires to be positive semidefinite. In this time-varying setting, one looks for a solution curve in such that is an optimal solution for (SDP) at each time point .
Time-dependent problems leading to TV-SDPs occur in various applications, such as optimal power flow problems in power systems [38], state estimation problems in quantum systems [1], modeling of energy economic problems [21], job-shop scheduling problems [7], as well as problems arising in signal processing, queueing theory [41], or aircraft engineering [52]. TV-SDPs can be seen as a generalization of continuous linear programming problems, which were first studied by Bellman [11] in relation to so-called bottleneck problems in multistage linear production economic processes. Since then, a large body of literature has been devoted to studying continuous linear programs with and without additional assumptions. However, the generalization of this idea to other classes of optimization problems has only recently been considered. In [55], Wang, Zhang, and Yao study continuous conic programs, and finally Ahmadi and Khadir [3] consider time-varying SDPs. In contrast to our setting, they require the data to vary polynomially with time and also restrict themselves to polynomial solutions. Moreover, the problems studied there involve kernel terms and more complicated constraints, while our work addresses TV-SDPs in a simpler sense of univariate parametric SDPs, following the literature thread of [26, 30].
A naive approach to solve the time-varying problem (SDP) is to consider, at a sequence of times , the instances of the problem (SDP) for and solve them one after another. The best solvers for SDPs are interior point methods [34, 53, 6, 24, 35], which can solve them in a time that is polynomial in the input size. However, these solvers do not scale particularly well, and thus this brute-force approach may fail in applications where the volume and velocity of the data are large. Furthermore, such a straightforward method would not make use of the local information collected by solving the previous instances of the problem. Even if one considers warm starts [27, 28, 23, 20, 51], the reduction in run time is likely to be marginal. For instance, [51, sections 5.5 and 5.6] reports a 30–60% reduction of the runtime on a collection of time-varying instances of their own choice.
Instead, in this work, we would like to utilize the idea of so-called path-following predictor-corrector algorithms as developed in [29, 5]. In classical predictor-corrector methods, a predictor step for approximating the directional derivative of the solution with respect to a small change in the time parameter is applied, together with a correction step that moves from the current approximate solution closer to the next solution at the new time point. The latter is based on a Newton step for solving the first-order optimality KKT conditions.
A limiting factor in solving both stationary and time-dependent SDPs is computational complexity when is large. A common solution to this obstacle is the Burer–Monteiro approach, as presented in the seminal work [16, 17]. In this approach, a low-rank factorization of the solution is assumed with and potentially much smaller than . In the optimization literature, the Burer–Monteiro method has been very well studied as a nonconvex optimization problem, e.g., in terms of algorithms [36], quality of the optimal value [9, 47], and (global) recovery guarantees [14, 15, 18].
In a time-varying setting, the Burer–Monteiro factorization leads to
(BM) | ||||||
s.t. |
which for every fixed is a quadratically constrained quadratic problem. A solution then is a curve in , which, depending on , is a space of much smaller dimension than . However, this comes at the price that the problem (BM) is now nonconvex. Moreover, theoretically it may happen that local optimization methods converge to a critical point that is not globally optimal [54], although in practice the method usually shows very good performance [16, 36, 49].
The aim of this work is to combine the Burer–Monteiro factorization with path-following predictor-corrector methods and to develop a practical algorithm for approximating the solution of (BM), and consequently of (SDP), over time. As we explain in section 3, to apply such methods, we need to address the issue that the solutions of (BM) are never isolated, due to the nonuniqueness of the Burer–Monteiro factorization caused by orthogonal invariance. In this paper, we apply a well-known technique to handle this problem by restricting the solutions to a so-called horizontal space at every time step. From a geometric perspective, such an approach exploits the fact that equivalent factorizations can be identified as the same element in the corresponding quotient manifold with respect to the orthogonal group action [39].
The paper is structured as follows. In section 2 we review important foundations from the SDP literature and state the main assumptions we make on the TV-SDP problem (SDP). Section 3 presents the underlying quotient geometry of positive semidefinite rank- matrices from a linear algebra perspective, focusing in particular on the notion of horizontal space and the domain of injectivity of the map . We then describe in section 4 our path-following predictor-corrector algorithm, which is based on iteratively solving the linearized KKT system for (BM) over time. A main result is the rigorous error analysis for this algorithm presented in subsection 4.3. In section 5, we showcase numerical results that test our method on a time-varying variant of the well-known Goemans–Williamson SDP relaxation for the Max-Cut problem in combinatorial optimization and graph theory. We conclude in section 6 with a brief discussion of our results.
2 Preliminaries and key assumptions
Naturally, the rigorous formulation of path-following algorithms requires regularity assumptions on the solution curve. In our context, this will require both assumptions on the original TV-SDP problem (SDP) as well as on its reformulation (BM). In particular for the latter, the correct choice of the dimension is crucial. In what follows, we present and discuss these assumptions in detail.
First, we briefly review some standard notions and properties for primal-dual SDP pairs; see [4, 56]. Consider the conic dual problem of (SDP):
(D-SDP) | ||||||
s.t. |
where is the linear operator adjoint to . For convenience, we often drop the explicit dependence on and refer to a solution of (D-SDP) simply as . While reviewing the basic properties of SDPs, we assume the time parameter to be fixed and hence omit the subindex .
The KKT conditions for the pair of primal-dual convex problems (SDP)-(D-SDP) read
(2.1) | ||||||
These are sufficient conditions for the optimality of the pair .
Definition 2.1 (strict feasibility).
We say that strict feasibility holds for an instance of primal SDP if there exists a positive definite matrix that satisfies . Similarly, strict feasibility holds for the dual if there exist a vector satisfying .
It is well-known that under strict feasibility the KKT conditions are also necessary for optimality. Note that, in general, a pair of optimal solutions satisfies the inclusions and , where and denote the image and kernel, respectively.
Definition 2.2 (strict complementarity).
A primal-dual optimal point is said to be strictly complementary if (or, equivalently, ). A primal-dual pair of an instance of SDP satisfies strict complementarity if there exists a strictly complementary primal-dual optimal point .
Definition 2.3 (nondegeneracy).
A primal feasible point is primal nondegenerate if
(2.2) |
with being the tangent space to the manifold of fixed rank- symmetric matrices at , where . Let be a rank-revealing decomposition, then
A dual feasible point is dual nondegenerate if
where is the tangent space at to the manifold of fixed rank- symmetric matrices with .
Primal-dual strict feasibility implies the existence of both a primal and a dual optimal solution with a zero duality gap. In addition, primal (dual) nondegeneracy implies dual (primal) uniqueness of the solutions. Under strict complementarity, the converse is also true, that is, primal (dual) uniqueness of the primal dual optimal solutions implies dual (primal) nondegeneracy of these solutions. Moreover, primal-dual nondegeneracy and strict complementarity hold generically. We refer to [4] for details.
With regard to the time-varying case, these facts can be generalized as follows.
Theorem 2.4.
(Bellon et al., [12, Theorem 2.19]) Let (P,D) be a primal-dual pair of TV-SDPs parametrized over a time interval such that primal-dual strict feasibility holds for any and assume that the data are continuously differentiable functions of . Let be a fixed value of the time parameter and suppose that is a nondegenerate optimal and strictly complementary point for (P,D). Then there exists and a continuously differentiable unique map** defined on such that is a unique and strictly complementary primal-dual optimal point to (P,D) for all . In particular, the ranks of and are constant for all .
The last statement of the theorem directly follows from the fact that a change in the rank of either or implies a loss of strict complementarity because of the lower-semicontinuity of the rank. Based on these facts, for the initial problem (SDP) we make the following assumptions.
- (A1)
-
(A2)
The linear operator is surjective in any .
- (A3)
-
(A4)
The solution pair is strictly complementary for any .
-
(A5)
Data are continuously differentiable functions of .
Assumptions (A1) and (A2) are standard for SDPs and in linearly constrained optimization in general, while assumptions (A3)–(A4) rule out many “pathological” cases [12]. In particular, assumption (A3) implies that the solution pair is unique. By Theorem 2.4, assumptions (A3), (A4), and (A5) have the following consequences:
-
(C1)
(SDP) has a unique and smooth solution curve , .
-
(C2)
The curve is of constant rank .
For setting up the factorized version (BM) of (SDP), it is necessary to choose the dimension of the factor matrix in (BM), ideally equal to of (C2).
In what follows, we assume that we know the constant rank .
Given access to an initial solution at time , it is possible to compute , so this assumption is without further loss of generality.
It is worth noting that the rank cannot be arbitrary. Based on a known result of Barvinok and Pataki [10], for any SDP defined by linearly independent constraints, there always exists a solution of rank such that . Since we assume that is the unique solution to (SDP) with constant rank we conclude that
We point out that recently the Barvinok–Pataki bound has been slightly improved [33].
3 Quotient geometry of positive semidefinite rank- matrices
We now investigate the factorized formulation (BM) in more detail. As already mentioned, in contrast to the original problem (SDP), this is a nonlinear problem (specifically, a quadratically constrained quadratic problem) which is nonconvex. Moreover, the property of uniqueness of a solution, which is guaranteed by (C1) for the original problem (SDP), is lost in (BM), because its representation via the map
is not unique. In fact, this map is invariant under the orthogonal group action
on , where
with denoting the identity matrix, is the orthogonal group. Hence both the objective function and the constraints in (BM) are invariant under the same action. As a consequence, the solutions of (BM) are never isolated [36]. This poses a technical obstacle to the use of path-following algorithms, as the path needs to be, at least locally, uniquely defined.
On the other hand, by assuming that the correct rank of a unique solution for (SDP) has been chosen for the factorization, any solution for (BM) must satisfy . From this it follows that any solution is of the form with ; see, e.g., [17, Lemma 2.1]. In other words, the action of the orthogonal group is indeed the only source of nonuniqueness. This corresponds to the well-known fact that the set of positive definite fixed rank- symmetric matrices, which we denote by , is a smooth manifold that can be identified with the quotient manifold , where is the open set of matrices with full column rank.
In the following, we describe how the nonuniqueness can be removed by introducing a so-called horizontal space, which is a standard concept in optimization on quotient manifolds, see, e.g., [2, section 3.5.8]. For positive semidefinite fixed-rank matrices, this has been worked out in detail in [39]. Additional material, including the complex Hermitian case, can be found in [8]. However, in order to arrive at practical formulas that are useful for our path-following algorithm later on, we will not further refer to the concept of a quotient manifold but directly focus on the injectivity of the map on suitable linear subspaces of , which we describe in the following section. Such a simplification takes into account that we are dealing with a quotient manifold with being just an open subset of . Then the horizontal space at a point should be a subspace of the tangent space of at , which, however, is just .
3.1 Horizontal space and unique factorizations
Given , we denote the corresponding orbit under the orthogonal group as
The orbit is an embedded submanifold of of dimension with two connected components, according to . Its tangent space at , which we denote by , is easily derived by noting that the tangent space to the orthogonal group at the identity matrix equals the space of real skew-symmetric matrices (see, e.g., [2, Example 3.5.3]). Therefore,
Since the map is constant on , its derivative
vanishes on , that is .
The horizontal space at , denoted by , is the orthogonal complement of with respect to the Frobenius inner product. One verifies that
since holds for all skew-symmetric if and only if is symmetric. We point out that sometimes any subspace complementary to is called a horizontal space, but we will stick to the above choice, as it is the most common and has certain theoretical and practical advantages. In particular, since , the affine space equals , so it is just a linear space.
The purpose of the horizontal space is to provide a unique way of representing a neighborhood of in through with . Clearly,
Moreover, the following holds.
Proposition 3.1.
The restriction of to is injective. In particular, it holds that
where is the smallest singular value of . This lower bound is sharp if . For one has the sharp estimate
As a consequence, in either case, .
Proof.
For we have by standard properties of the trace. Taking yields
To derive the second equality we used for . Clearly, and if we also have that . This proves the asserted lower bounds. To show that they are sharp, let be a (normalized) singular vector tuple such that . If , then for any such that one verifies that the matrix is in and achieves equality. When , achieves it. ∎
Since maps to , which is of the same dimension as , the above proposition implies that is a bijection between and . This already shows that the restriction of to the linear space is a local diffeomorphism between a neighborhood of in and a neighborhood of in . The subsequent more quantitative statement matches Theorem 6.3 in [39] on the injectivity radius of the quotient manifold . For convenience we will provide a self-contained proof that is more algebraic and does not require the concept of quotient manifolds.
Proposition 3.2.
Let . Then the restriction of to is injective and maps diffeomorphically to a (relatively) open neighborhood of in .
It is interesting to note that is the largest possible ball in on which the result can hold, since the rank-one matrices comprised of singular pairs of all belong to and is rank-deficient. Another important observation is that does not depend on the particular choice of within the orbit .
Proof.
Consider . Let be a singular value decomposition of with and having orthonormal columns. We assume . Then by we denote a matrix with orthonormal columns and . In the case , the terms involving in the following calculation are simply not present. We write
Since , we have
Then a direct calculation yields
and analogously for . Since the four terms in the above sum are mutually orthogonal in the Frobenius inner product, the equality particularly implies
as well as
(3.1) |
The first of these equations can be written as
By Proposition 3.1 (with , and ),
whereas
Since , this shows that we must have , which then by (3.1) also implies , since is invertible.
Hence, we have proven that is an injective map from to . To validate that it is a diffeomorphism onto its image we show that it is locally a diffeomorphism, for which again it suffices to confirm that is injective on for every (since and have the same dimension). It follows from Proposition 3.1 (with replaced by , which has full column rank) that the null space of equals . We claim that , which proves the injectivity of on . Indeed, let be an element in the intersection, i.e., for some skew-symmetric and . Inserting the first relation into the second, and using , yields the homogenuous Lyapunov equation
(3.2) |
The symmetric matrix
in (3.2) is positive definite, since (here denotes the -th eigenvalue of the corresponding matrix). But in this case (3.2) implies , that is, . ∎
Finally, it is also possible to provide a lower bound on the radius of the largest ball around such that its intersection with is in the image (so that an inverse map is defined).
Proposition 3.3.
Any satisfying is in the image , that is, there exists a unique such that .
Observe that one could take
(3.3) |
as a slightly cleaner sufficient condition in the proposition.
Proof.
Let with and assume a polar decomposition of , where , is positive semidefinite, and is orthogonal. Let . Then
(3.4) |
satisfies , and since is symmetric, we have . We need to show , that is, . Proposition 3.2 then implies that is unique in . Let be the orthogonal projector onto the column span of and . With that, we have the decomposition
(3.5) |
We estimate both terms separately. Since is symmetric and positive semidefinite, the first term satisfies
(3.6) |
A simple consideration using a singular value decomposition of and reveals that
for some and with orthonormal columns. Consequently, by von Neumann’s trace inequality (see, e.g., [32, Theorem 7.4.1.1]), we have
Inserting this into (3.1) yields
We remark that we could have concluded this inequality from [8, Theorem 2.7] where it is also stated. It actually holds for any for which is symmetric and positive semidefinite using the same argument (in particular for replaced with the initial ). Let now be a singular value decomposition of with the smallest positive singular value. Then for some positive semidefinite and it follows from well-known results, (cf. [50]), that111For completeness we provide the proof. The matrix is the unique solution to the matrix equation . Indeed, the linear operator on is symmetric in the Frobenius inner product and has positive eigenvalues (the eigenvectors are rank-one matrices with the eigenvectors of ). Hence .
Noting that and we conclude the first part with
(3.7) |
The second term in (3.5) can be estimated as follows:
(3.8) |
where we used the Cauchy Schwarz inequality and the fact that has rank at most .
Remark 3.4.
From definition (3.4) of , since is given by the polar decomposition , it follows that
see, e.g., [32, section 7.4.5]. In general, given any , both of rank , the minimizer in this problem is necessarily obtained by choosing from the polar decomposition of so that is necessarily symmetric, that is, and hence are in the horizontal space . In fact, the quantity defines a Riemannian distance between the orbits and in the corresponding quotient manifold; see [39, Proposition 5.1].
3.2 A time interval for the factorized problem
We now return to the factorized problem formulation (BM). Let be an optimal solution of (BM) at some fixed time point (so that and ). Based on the above propositions we are able to state a result on the allowed time interval for which the factorized problem (BM) is guaranteed to admit unique solutions on the horizontal space corresponding to the original problem (SDP). For this, exploiting the smoothness of the curve , we first define
(3.10) |
a uniform bound on the time derivative, as well as
(3.11) |
on the smallest eigenvalue of , are available for . Notice that the existence of such bounds is without any further loss of generality: the existence of follows from (C1), which guarantees that is a smooth curve, while the existence of is guaranteed by (C2), since has a constant rank.
Theorem 3.5.
Proof.
The results of this section motivate the definition of a version of (BM) restricted to , which we provide in the next section.
4 Path following the trajectory of solutions
In this section, we present a path-following procedure for computing a sequence of approximate solutions at different time points that tracks a trajectory of solutions to the Burer–Monteiro reformulation (BM). From this sequence we are then able to reconstruct a corresponding sequence of approximate solutions tracking the trajectory of solutions for the full space TV-SDP problem (SDP). The path-following method is based on iteratively solving the linearized KKT system. Given an iterate on the path, we explained in the previous section how to eliminate the problem of nonuniqueness of the path in a small time interval by considering problem (BM) restricted to the horizontal space . We now need to ensure that this also guarantees that the linearized KKT system admits a unique solution. We show in Theorem 4.2 that this is indeed guaranteed under standard regularity assumptions on the original problem (SDP). This is a remarkable fact of somewhat independent interest.
4.1 Linearized KKT conditions and second-order sufficiency
Given an optimal solution at time , we aim to find a solution at time . By the results of the previous section, the next solution can be expressed in a unique way as
where is in the horizontal space , provided that is small enough.
We define the following maps:
(4.1) | ||||
By definition, if and only if . For symmetry reasons we use the equivalent condition (which reflects the fact that is actually a linear space).
To find the new iterate we hence consider the problem
(BM) | ||||||
s.t. | ||||||
This is a quadratically constrained quadratic problem whose Lagrangian is
(4.2) |
with multipliers and . The KKT conditions of problem (BM) are
(4.3) | |||
Hence, (4.3) reads explicitly as
The linearization of (4.3) at leads to a linear system
(4.4) |
where denotes the derivative of at . Note that it actually does not depend on , but we will keep this notation for consistency. As a linear operator on , can be written in block matrix notation as follows,
where from (4.1) and (4.2) one derives
For later reference, observe that as a bilinear form reads
Solving (4.4) for obtaining updates is equivalent to applying one step of Newton’s method to the KKT system (4.3) (Lagrange–Newton method).
Our aim in this subsection is to show that for small enough the system (4.4) is uniquely solvable when is a KKT-pair for the overparametrized problem (BM). Since the system is continuous in , we can do that by showing that it admits a unique solution for . This corresponds to proving second-order sufficient conditions for the optimality of problem (BM) for . Interestingly, it is possible to relate this to standard regularity hypotheses on the original semidefinite problem (SDP). For this we first need a uniqueness statement on the Lagrange multiplier .
Lemma 4.1.
Given an optimal solution to (SDP), suppose that is a unique (see consequence (C1)), primal nondegenerate (see Definition 2.3 and assumption (A3)) solution. Then there is a unique optimal Lagrangian multiplier for (BM) independent of the choice of in the orbit . Moreover, is the unique dual solution to (D-SDP).
Proof.
We start by recalling that the optimal set for (BM) coincides with . Since the KKT conditions for (BM) are just
(and ), the set of all optimal dual multipliers for (BM) is
To show that this set is a singleton, it suffices to prove that the homogeneous equation has only the zero solution. By (2.2), primal nondegeneracy for can read as
where . Noticing that implies , we get that and thus since is injective by assumption (A2). To prove the second statement, observe that by primal nondegeneracy (D-SDP) has a unique solution corresponding, by assumption (A2), to a unique dual multipliers vector (see Theorem 7 in [4]). Furthermore, satisfies by (2.1). Since has full column rank if is chosen equal to , this implies that . From the first statement it then follows that . ∎
We can now state and prove the main result of this subsection.
Theorem 4.2.
Let be a strictly complementary (see Definition 2.2) optimal primal-dual pair of solutions to (SDP)-(D-SDP) such that is a primal nondegenerate solution. Let be the unique corresponding Lagrange multiplier for (BM) according to Lemma 4.1. Then the triple is a KKT triple for (BM) at (that is, ) and fulfills the second-order sufficient conditions:
(4.5) |
for all satisfying and . In particular, is invertible.
Proof.
Since by the KKT conditions for (BM) and , it is obvious that . It is well-known that the linearized KKT system (4.4) admits a unique solution if (and only if) the second-order sufficient conditions (4.5) hold; see e.g., [43, Lemma 16.1]. Since is an optimal solution for the original primal-dual pair of SDPs, and it hence satisifies the second-order necessary conditions for optimality (that is, ), (4.5) holds with “”. Assume that
for some satisfying and . Since is positive semidefinite, the columns of must belong to the kernel of . By strict complementarity they hence belong to the column space of , which is equal to the column space of . Therefore for some matrix . Consider now the matrix
depending on a real parameter . Clearly, and, for nonzero small enough, is positive semidefinite. Furthermore, for a suitable choice of the sign of , we have . Since is the unique solution of (SDP), this implies and thus must be zero. Since Proposition 3.1 yields , and this completes the proof. ∎
Corollary 4.3.
Clearly, this is only a qualitative result. An upper bound for feasible could be expressed in terms of the spectral norm of the inverse of using perturbation arguments. This would require a lower bound on the absolute value of the eigenvalues of . In this context, we should clarify that the eigenvalues, and hence also the condition number of (for sufficiently small as above), do not depend on the particular choice of in the orbit . This is obviously also relevant from a practical perspective. To see this, note that as a bilinear form (on ) reads
For any fixed one therefore has
with the unitary linear operator on . It follows that and have the same eigenvalues.
However, our proof of Theorem 4.2 is by contradiction and hence does not provide an obvious lower bound on the radius of invertibility of . Here we do not intend to investigate this in more depth. In the error analysis conducted later we will essentially assume to have such a bound available (cf. Lemma 4.5).
4.2 A path-following predictor-corrector algorithm
We now thoroughly describe the path-following predictor-corrector algorithm that we propose for tracking the trajectory of solutions to (SDP). It includes an optional adaptive step size tuning step which is based on measuring the residual of the optimality conditions, defined as
(RES) |
The residual expresses the maximal component-wise violation of the optimality KKT conditions for the problem (BM) and is therefore a suitable error measure. Indeed (see, e.g., [57, Theorems 3.1 and 3.2]), if the second-order sufficiency condition for optimality holds at , then there are constants such that for all with one has
Here and in the following, we we use the norm .
The overall procedure is displayed as Algorithm 1 below. Given a TV-SDP of the form (SDP), parameterized over a time interval , the inputs are an approximate initial primal-dual solution pair to (SDP)–(D-SDP) and an initial step size . At each iteration the current iterate is used to construct the linear system (4.4), which is then solved, returning the updates and . The presented version of the algorithm also includes a procedure for tuning the step size that can be activated through the Boolean variable step size_TUNING and is supposed to ensure that the residual threshold is satisfied at every time step. Specifically, if for a time step the threshold is violated, the step size is reduced by a factor and a more accurate solution is obtained by solving the linearized KKT system (4.4) for the reduced time step. On the other hand, to avoid unnecessary small steps, the step size is increased after every successful step by a factor (but is never made larger than ). If the step size tuning is deactivated, the algorithm just runs with the constant step size instead. Note that Algorithm 1 tracks both the primal solution and the dual solution .
Input: an initial approximate primal-dual solution to (SDP)–(D-SDP)
initial step size
boolean variable step size_TUNING
step size tuning parameters ,
residual tolerance
Output: solutions to (SDP) for
4.3 Error analysis
We investigate the algorithm without step size tuning. The main goal of the following error analysis to show that the computed , where , remain close to the exact solutions , if properly initialized. The logic of the proof is similar to standard path following methods based on Newton’s method, e.g. [22]. The specific form of our problem requires some additional considerations that allow for more precise quantitative bounds depending on the problem constants.
Throughout this section, is an optimal primal-dual pair of solutions to (SDP)–(D-SDP) satisfying the five assumptions (A1)–(A5), so that it is strictly complementary (see Definition 2.2) and such that is primal nondegenerate. Notice that the choice of factor can be arbitrary, since it does not affect any of the subsequent statements. In Lemma 4.1 and its proof, we have seen that for every the unique Lagrange multiplier satisfies , that is,
with being the pseudo-inverse of . By assumption (A5), and depend smoothly on and so does , since is surjective for all by assumption (A2). Also, by Theorem 2.4, is smooth. Therefore the curve is smooth. Since the algorithm operates in the space, our implicit goal is to show that the iterates stay close to the set
containing the optimal primal-dual trajectories in the Burer–Monteiro factorization.
Lemma 4.4.
The set is compact.
Proof.
As the curve is continuous, it suffices to prove that the set is compact. Since and is smooth, it is bounded. To see that the set is closed, let be a convergent sequence with limit such that for some . By passing to a subsequence, we can assume . Then obviously , which shows that is in the set. ∎
We consider the norm on defined by . The induced operator norm is denoted .
Lemma 4.5.
There exists a constant such that
(4.6) |
for all .
Proof.
On its open domain of definition, the map is continuous. By Theorem 4.2, the compact set is contained in that domain. Therefore, achieves its maximum on . ∎
Lemma 4.6.
For any and , the map** is Lipschitz continuous in the operator norm on . Specifically,
for all and , where is the operator norm of .
Proof.
Since is assumed to be continuous, the constant satisfies the uniform Lipschitz condition
(4.7) |
for all and , independent of the choice of . In what follows, we proceed with using (4.7) and (4.6), without further investigating the sharpest possible bounds.
In addition, let be a uniform lower bound on the smallest positive eigenvalue as in (3.11). Furthermore, we now also assume a uniform upper bound
on the spectral norm of . Finally, let as in (3.10) and since the curve is smooth, the constant
(4.8) |
is also well-defined.
With the necessary constants at hand, we are now in the position to state our main result on the error analysis. The following theorem shows that we can bound the distance between the iterates of Algorithm 1 and the set of solutions to (BM) provided the initial point is close enough to the set of initial solutions and the step size is small enough. Here we employ again the natural distance measure between the orbits and , cf. Remark 3.4.
Theorem 4.7.
Let and be small enough such that the following three conditions are satisfied:
(4.9) | |||
(4.10) | |||
(4.11) |
Assume for the initial point that
(4.12) |
Then Algorithm 1 is well-defined and for all the iterates satisfy
It then holds that
for all .
Notice that the left side of (4.11) is for , whereas the right side is only . Therefore for and small enough, (4.11) will be satisfied. Furthermore, a sufficient condition for (4.12) to hold is that
and
which easily follows from (3.9).
Proof.
We will investigate one step of the algorithm and apply an induction hypothesis that at time point there exists satisfying
We aim to show that for sufficiently small and the next iterate in the algorithm is well-defined and satisfies the same estimate
with an exact solution . The proof of the theorem then follows by induction over the steps in the algorithm.
We first claim that there exists an exact solution in the horizontal space of , that is, and . Indeed, using (4.9) we have
This yields
Thus, Proposition 3.3 states the existence of as desired. We note for later use that by (3.9) it satisfies
(4.13) | ||||
The matrix is an exact solution of (BM), and by Theorem 4.2 there is a unique Lagrange multiplier such that . By construction, the next iterate in the algorithm is obtained from one step of the Newton method for solving this equation with starting point . In light of (4.6) and (4.7), standard results (e.g. Theorem 1.2.5 in [42]) on the Newton method yield that under the condition
one step of the method is well-defined, i.e., is invertible, and satisfies
In particular, using would give the desired result
Therefore, we need to ensure that
is satisfied. Here the second inequality is just condition (4.10). We now show that (4.11) is a sufficient condition for the first inequality. Clearly, using (4.8), we have
Together with (4.13) this gives
Now (4.11) ensures the desired estimate for the right-hand side and the proof is completed. ∎
5 Numerical experiments on Time-Varying Max Cut
In this section, we compare the tracking of the trajectory of solutions to TV-SDP via Algorithm 1 with interior-point methods (IPMs) used to track the same trajectory by solving the problem at discrete time points. In our experiments, we used the implementation of the homogeneous and self-dual algorithm [6, 24] from the MOSEK Optimization Suite, version 9.3 [40]. Furthermore, in order to provide a comparison with an alternative warm-start approach, we performed numerical experiments using the Splitting Conic Solver (SCS), version 3.2.2 [46]. This package implements the first-order method presented in [44, 45], which uses an operator splitting method, the alternating directions method of multipliers, to solve the homogeneous self-dual embedding. We show the algorithm proposed in this paper can perform better, in terms of both accuracy and runtime, than repeated runs of IPM for time-invariant SDP and than the warm-started SCS.
Given a weighted graph , the Max-Cut problem is a well-known problem in graph theory. There, we wish to find a binary partition of the vertices in (also known as a cut) of maximal weight. The weight of the cut is defined as the sum of the weights of the edges in connecting the two subsets of the partition. This problem can be formulated as the following quadratically-constrained quadratic problem
(MC) | ||||||
s.t. |
where is the number of vertices of the graph, is the weight of the edge connecting vertices and , and variable takes binary values according to the subset to which vertex is assigned. This problem can be relaxed to an SDP of the form
(MCR) | ||||||
s.t. | ||||||
where is the weights matrix whose entry is given by , see [25]. Note that the number of constraints is equal to the size of the variable matrix. Randomized approximation algorithms for (MC) exploiting the convex relaxation (MCR) deliver solutions with a performance ratio of and are known to be the best poly-time algorithms to approximately solve (MC).
In this paper, we adopt a time-varying version of (MCR) as a benchmark, where the data matrix depends on a time parameter . (We point out that this differs from the recently studied variant [37, 31] with edge insertions and deletions, which could be seen as discontinuous functions of time.)
In our experiment, is obtained as a random linear perturbation of a sparse weight matrix with density . Specifically,
where the entries of are randomly generated with a normal distribution having mean and standard deviation , while the entries in are chosen with a normal distribution having . Both matrices have the same sparsity structure. We refer to such a problem as the time-varying max-cut relaxation (TV-MCR), which can be thought of as a convex relaxation for a max-cut problem where the edges weights of a given graph change over time.
All the experiments were conducted on a personal computer with a 1,6 GHz Intel Core i5 dual-core processor with 16GB RAM, using a Python implementation of our path-following algorithm. The main goal was to illustrate the potential computational benefits of our algorithm, so we did not attempt to provide the most efficient implementation. The code222https://github.com/antoniobellon/burer-monteiro-path-following, Eclipse Public License 2.0. as well as the data and experimental results333https://zenodo.org/record/7769225 are available online.
We performed experiments on instances of the TV-MCR problem with vertices and tracked the trajectory of solutions for . Among these samples, we included 10 instances of TV-MCR for which the rank of the solution is not constant, hence violating our assumption (A4). This was done by sampling the rank (estimated with a tolerance on zero eigenvalues of ) of the solutions obtained using MOSEK over a 10-steps subdivision of the interval and selecting ten cases in which we observed a change in the rank. Using the same procedure, we checked that for the remaining 100 instances, the rank of the solution is constant along the trajectory.
First, we applied Algorithm 1 without step size adjustment, hence setting step size_TUNING to FALSE, and using step sizes , so that in each experiment 10, 100, and 1000 iterations are performed for each choice of the step size (see Figures 1 and 2). The factor dimension is chosen equal to the rank of an initial solution obtained using MOSEK with relative gap termination tolerances set to . Its distribution is shown in Table 1.
4 | 5 | 6 | 7 | |
---|---|---|---|---|
# occurences | 2 | 39 | 53 | 6 |
![Refer to caption](extracted/5337857/PLOTS/residuals.png)
Figure 1 depicts the distribution over 100 instances of the average residuals along the tracking of the solution on the time interval , as a function of the used step size. For each whisker plot, the error bars span the interval from the minimum to the maximum, while the box spans the first quartile to the third quartile, with a horizontal line at the median.
In the left plot, the light green dots correspond to the average residuals of the 10 rank-changing instances; instead, the right plot excludes these degenerate instances form the data set. Notice that these points correspond to TV-SDP instances that do not satisfy our assumption (A4). The green plot shows the average residual obtained by tracking the solution with Algorithm 1, the orange plot shows the average residual when the tracking is done using SCS with relative and absolute feasibility tolerances set to , warm-started with the current solution; finally, the bordeaux color plot shows the average residual when the tracking is done using MOSEK IPM [40] with the relative gap termination tolerances set to .
The residual of an SDP primal-dual solution is defined, in analogy to (RES), as
By choosing a suitable step size (in our experiments order ), Algorithm 1 yields an average residual accuracy that is comparable to the one obtained using standard IPMs with very small relative gap termination tolerance. For a step size of order , our algorithm exhibits a residual precision that is 100 times more accurate than both IPM and warm-started SCS. Furthermore, as we see next, this accuracy is reached much faster with our approach.
In Figure 2 we plot the distributions of the runtimes of Algorithm 1 (green) as a function of the step size, as well as the distributions of the runtimes of IPM (bordeaux) used with relative gap termination tolerances and of the warm-started SCS (orange) to track the solutions trajectory at a constant step size resolution.
Remarkably, for each step size that we tested, the mean runtime of Algorithm 1 is on average about ten times smaller then both SCS and MOSEK IPM, indicating competitive computational performances of our algorithm.
![Refer to caption](extracted/5337857/PLOTS/runtimes_bigger.png)
![Refer to caption](extracted/5337857/PLOTS/gridplot_1.png)
![Refer to caption](extracted/5337857/PLOTS/gridplot_2.png)
Finally, we apply Algorithm 1 to the same set of TV-MCR problems allowing for a step size adjustment (setting step size_TUNING to TRUE). In order to provide a fair comparison with MOSEK IPM, we fixed five subdivisions of the interval in a grid of, respectively, 20, 40, 60, 80, and 100 equidistant points. For each grid, at each time point, we used MOSEK with a relative gap termination tolerance of to obtain the corresponding TV-SDP solution, recording the runtime and the average residual over the tracking of each instance. For each grid, we then run our algorithm with step size adjustment in order to ensure the same average residual accuracy guaranteed by MOSEK, additionally enforcing the path-following procedure to hit the grid points. In this way, we ensure that our procedure has the same accuracy of MOSEK both in terms of the solution residual and of the tracking resolution.
Figure 3 shows the distributions of the runtimes as a function of the number of grid points of both Algorithm 1 (green) and IPM with two different relative gap termination tolerances: (Figure 3(a)) and (Figure 3(b)).
Encouragingly, we observe that we can ensure both the same accuracy and tracking resolution of MOSEK at a smaller average runtime. The constant behavior of the green plot on the right is due to the fact that, in order to ensure the same residual accuracy of the IPM, the path-following procedure needs to consider a number of points that are quite denser then the number of grid points, and hence independent from this latter, while for the plot on the left it is instead sufficient for Algorithm 1 to follow the grid.
6 Conclusion
In this paper, we proposed an algorithm for solving time-varying SDPs based on a path-following predictor-corrector scheme for the Burer–Monteiro factorization. The restriction to a horizontal space ensures that the linearized KKT conditions system is uniquely solvable under standard regularity assumptions on the TV-SDP problem, thus leading to a well-defined path-following procedure with rigorous error bounds on the distance from the optimal trajectory. Preliminary numerical experiments on a time-varying version of the max-cut SDP relaxation suggest that our algorithm is competitive both in terms of runtime and accuracy when compared to the application of standard IPMs. Future work should explore the applicability and relative merits of our approach in further applications.
So far we have assumed that the rank of the true solution curve is known and remains constant. While this is certainly appropriate for a rigorous analysis as conducted in this work, it might be restrictive in practice. An important extension hence would be to develop rank-adaptive versions of our path-following approach that are able to detect and adjust the appropriate rank in a Burer–Monteiro factorization, for example, by monitoring the smallest singular values of the matrices .
Another important aspect is the initialization of the method, which requires an accurate SDP solution and is currently not based on Burer–Monteiro factorization, thus undermining the computational efficiency of the whole approach. The obvious way out is to also solve the initial time problem using the factorized approach [16]. The metaalgorithm presented in [36] even does this in a rank-adaptive way. Although this is a nonconvex problem, several works, including also [13, 48, 19], have considered Burer–Monteiro schemes with guaranteed and certifiable convergence to a globally optimal low-rank factor under mild conditions, making this a reliable approach in practice.
Acknowledgments
The research leading to these results received funding from the OP RDE under Grant Agreement CZ.02.1.01/0.0/0.0/16_019/0000765. The first author gratefully acknowledges the support of the Czech Science Foundation (grant 22-15524S). The authors also thank two anonymous referees for their helpful comments.
References
- [1] S. Aaronson, X. Chen, E. Hazan, S. Kale, and A. Nayak. Online learning of quantum states. J. Stat. Mech. Theory Exp., 2019, pages 124019, 14, 2019.
- [2] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization algorithms on matrix manifolds. Princeton University Press, Princeton, NJ, 2008.
- [3] A. A. Ahmadi and B. El Khadir. Time-varying semidefinite programs. Math. Oper. Res., 46(3):1054–1080, 2021.
- [4] F. Alizadeh, J.-P. A. Haeberly, and M. L. Overton. Complementarity and nondegeneracy in semidefinite programming. Math. Program., 77(2, Ser. B):111–128, 1997.
- [5] E. L. Allgower and K. Georg. Introduction to numerical continuation methods. SIAM, Philadelphia, 2003.
- [6] E. D. Andersen, C. Roos, and T. Terlaky. On implementing a primal-dual interior-point method for conic quadratic optimization. Math. Program., 95:249–277, 2003.
- [7] E. J. Anderson. A Continuous Model For Job-Shop Scheduling. PhD thesis, University of Cambridge, Cambridge, 1978.
- [8] R. Balan and C. B. Dock. Lipschitz analysis of generalized phase retrievable matrix frames. SIAM J. Matrix Anal. Appl., 43(3):1518–1571, 2022.
- [9] A. Barvinok. Problems of distance geometry and convex properties of quadratic maps. Discrete Comput. Geom., 13(2):189–202, 1995.
- [10] A. Barvinok. A remark on the rank of positive semidefinite matrices subject to affine constraints. Discrete Comput. Geom., 25(1):23–31, 2001.
- [11] R. Bellman. Bottleneck problems and dynamic programming. Proc. Nat. Acad. Sci. USA, 39:947–951, 1953.
- [12] A. Bellon, D. Henrion, V. Kungurtsev, and J. Mareček. Time-varying semidefinite programming: Geometry of the trajectory of solutions. arXiv:2104.05445, 2021.
- [13] N. Boumal. A Riemannian low-rank method for optimization over semidefinite matrices with block-diagonal constraints. arXiv:1506.00575, 2015.
- [14] N. Boumal, V. Voroninski, and A. Bandeira. The non-convex Burer-Monteiro approach works on smooth semidefinite programs. In D. Lee et al., editor, Advances in Neural Information Processing Systems, volume 29, pages 2757–2765. Curran Associates, Inc., 2016.
- [15] N. Boumal, V. Voroninski, and A. S. Bandeira. Deterministic guarantees for Burer-Monteiro factorizations of smooth semidefinite programs. Comm. Pure Appl. Math., 73(3):581–608, 2020.
- [16] S. Burer and R. D. C. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program., 95(2, Ser. B):329–357, 2003.
- [17] S. Burer and R. D. C. Monteiro. Local minima and convergence in low-rank semidefinite programming. Math. Program., 103(3, Ser. A):427–444, 2005.
- [18] D. Cifuentes. On the Burer-Monteiro method for general semidefinite programs. Optim. Lett., 15(6):2299–2309, 2021.
- [19] D. Cifuentes and A. Moitra. Polynomial time guarantees for the Burer-Monteiro method. In S. Koyejo et al., editor, Advances in Neural Information Processing Systems, volume 35, pages 23923–23935. Curran Associates, Red Hook, NY, 2022.
- [20] M. Colombo, J. Gondzio, and A. Grothey. A warm-start approach for large-scale stochastic linear programs. Math. Program., 127(2, Ser. A):371–397, 2011.
- [21] G. B. Dantzig. Large-scale systems optimizations with application to energy. Technical report SOL 77-3, 4 1977.
- [22] Q. T. Dinh, C. Savorgnan, and M. Diehl. Adjoint-based predictor-corrector sequential convex programming for parametric nonlinear optimization. SIAM J. Optim., 22(4):1258–1284, 2012.
- [23] A. Engau, M. F. Anjos, and A. Vannelli. On interior-point warmstarts for linear and combinatorial optimization. SIAM J. Optim., 20(4):1828–1861, 2010.
- [24] R. M. Freund. On the behavior of the homogeneous self-dual model for conic convex optimization. Math. Program., 106(3, Ser. A):527–545, 2006.
- [25] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM, 42(6):1115–1145, 1995.
- [26] D. Goldfarb and K. Scheinberg. On parametric semidefinite programming. In Proceedings of the Stieltjes Workshop on High Performance Optimization Techniques (HPOPT ’96), pages 361–377. Elsevier, Amsterdam, Appl. Numer. Math. 29, 1999.
- [27] J. Gondzio and A. Grothey. Reoptimization with the primal-dual interior point method. SIAM J. Optim., 13(3):842–864, 2002.
- [28] J. Gondzio and A. Grothey. A new unblocking technique to warmstart interior point methods based on sensitivity analysis. SIAM J. Optim., 19(3):1184–1210, 2008.
- [29] J. Guddat, F. Guerra Vazquez, and H. T. Jongen. Parametric optimization: singularities, pathfollowing and jumps. B. G. Teubner, Stuttgart; John Wiley & Sons, Ltd., Chichester, 1990.
- [30] J. D. Hauenstein, A. Mohammad-Nezhad, T. Tang, and T. Terlaky. On computing the nonlinearity interval in parametric semidefinite optimization. Math. Oper. Res., 47(4):2989–3009, 2022.
- [31] M. Henzinger, A. Noe, and C. Schulz. Practical fully dynamic minimum cut algorithms. In 2022 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), SIAM, Phildelphia, pages 13–26, 2022.
- [32] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge University Press, Cambridge, second edition, 2013.
- [33] J. Im and H. Wolkowicz. A strengthened Barvinok-Pataki bound on SDP rank. Oper. Res. Lett., 49(6):837–841, 2021.
- [34] F. Jarre. An interior-point method for minimizing the maximum eigenvalue of a linear combination of matrices. SIAM J. Control Optim., 31(5):1360–1377, 1993.
- [35] H. Jiang, T. Kathuria, Y. T. Lee, S. Padmanabhan, and Z. Song. A faster interior point method for semidefinite programming. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science, pages 910–918. IEEE Computer Society, Los Alamitos, CA, 2020.
- [36] M. Journée, F. Bach, P.-A. Absil, and R. Sepulchre. Low-rank optimization on the cone of positive semidefinite matrices. SIAM J. Optim., 20(5):2327–2351, 2010.
- [37] E. Kao, V. Gadepally, M. Hurley, M. Jones, J. Kepner, S. Mohindra, P. Monticciolo, A. Reuther, S. Samsi, W. Song, D. Staheli, and S. Smith. Streaming graph challenge: Stochastic block partition. In 2017 IEEE High Performance Extreme Computing Conference (HPEC), IEEE, Piscataway, NJ, pages 1–12, 2017.
- [38] J. Lavaei and S. H. Low. Zero duality gap in optimal power flow problem. IEEE Transactions on Power Systems, 27(1):92–107, 2012.
- [39] E. Massart and P.-A. Absil. Quotient geometry with simple geodesics for the manifold of fixed-rank positive-semidefinite matrices. SIAM J. Matrix Anal. Appl., 41(1):171–198, 2020.
- [40] MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual. Version 9.3., 2019.
- [41] Y. Nazarathy and G. Weiss. Near optimal control of queueing networks over a finite time horizon. Ann. Oper. Res., 170:233–249, 2009.
- [42] Y. Nesterov. Lectures on convex optimization. Springer, Cham, Switzerland, 2018.
- [43] J. Nocedal and S. Wright. Numerical optimization. Springer, New York, second edition, 2006.
- [44] B. O’Donoghue. Operator splitting for a homogeneous embedding of the linear complementarity problem. SIAM J. Optim., 31(3):1999–2023, 2021.
- [45] B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd. Conic optimization via operator splitting and homogeneous self-dual embedding. J. Optim. Theory Appl., 169(3):1042–1068, 2016.
- [46] B. O’Donoghue, E. Chu, N. Parikh, and S. Boyd. SCS: Splitting Conic Solver, version 3.2.2. https://github.com/cvxgrp/scs, Nov. 2022.
- [47] G. Pataki. On the rank of extreme matrices in semidefinite programs and the multiplicity of optimal eigenvalues. Math. Oper. Res., 23(2):339–358, 1998.
- [48] D. M. Rosen. Scalable low-rank semidefinite programming for certifiably correct machine perception. In Algorithmic Foundations of Robotics XIV, pages 551–566. Springer, Cham, Switzerland, 2021.
- [49] D. M. Rosen, L. Carlone, A. S. Bandeira, and J. J. Leonard. A certifiably correct algorithm for synchronization over the special Euclidean group, pages 64–79. in Algorithmic Foundations of Robotics XII, Springer, Cham, Switzerland, 2020.
- [50] B. A. Schmitt. Perturbation bounds for matrix square roots and Pythagorean sums. Linear Algebra Appl., 174:215–227, 1992.
- [51] A. Skajaa, E. D. Andersen, and Y. Ye. Warmstarting the homogeneous and self-dual interior point method for linear and conic quadratic problems. Math. Program. Comput., 5(1):1–25, 2013.
- [52] F. Teren. Minimum time acceleration of aircraft turbofan engines by using an algorithm based on nonlinear programming. In NASA Technical Memorandum TM-73741, Lewis Research Center, Cleveland, Ohio, September, 1977.
- [53] L. Tunçel. Potential reduction and primal-dual methods. In Handbook of semidefinite programming, volume 27 of Internat. Ser. Oper. Res. Management Sci., pages 235–265. Kluwer Acad., Boston, MA, 2000.
- [54] I. Waldspurger and A. Waters. Rank optimality for the Burer-Monteiro factorization. SIAM J. Optim., 30(3):2577–2602, 2020.
- [55] X. Wang, S. Zhang, and D. D. Yao. Separated continuous conic programming: strong duality and an approximation algorithm. SIAM J. Control Optim., 48(4):2118–2138, 2009.
- [56] H. Wolkowicz, R. Saigal, and L. Vandenberghe, editors. Handbook of semidefinite programming. Kluwer Academic, Boston, MA, 2000.
- [57] S. J. Wright. An algorithm for degenerate nonlinear programming with rapid local convergence. SIAM J. Optim., 15(3):673–696, 2005.