Higher-Order Newton Methods
with Polynomial Work per Iteration

Amir Ali Ahmadi, Abraar Chaudhry11footnotemark: 1, Jeffrey Zhang Princeton University, Operations Research and Financial Engineering. AAA and AC were partially supported by the MURI award of the AFOSR and the Sloan Fellowship.Yale University, Department of Biomedical Informatics and Data Science.
Abstract

We present generalizations of Newton’s method that incorporate derivatives of an arbitrary order d𝑑ditalic_d but maintain a polynomial dependence on dimension in their cost per iteration. At each step, our d𝑑ditalic_dth-order method uses semidefinite programming to construct and minimize a sum of squares-convex approximation to the d𝑑ditalic_dth-order Taylor expansion of the function we wish to minimize. We prove that our d𝑑ditalic_dth-order method has local convergence of order d𝑑ditalic_d. This results in lower oracle complexity compared to the classical Newton method. We show on numerical examples that basins of attraction around local minima can get larger as d𝑑ditalic_d increases. Under additional assumptions, we present a modified algorithm, again with polynomial cost per iteration, which is globally convergent and has local convergence of order d𝑑ditalic_d.

Keywords. Newton’s method, tensor methods, semidefinite programming, sum of squares methods, convergence analysis.

1 Introduction

Newton’s method is perhaps one of the most well-known and prominent algorithms in optimization. In its attempt to minimize a function f:n:𝑓superscript𝑛f:\mathbb{R}^{n}\rightarrow\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R, this algorithm replaces f𝑓fitalic_f with its second-order Taylor expansion at an iterate xknsubscript𝑥𝑘superscript𝑛x_{k}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and defines the next iterate xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT to be a critical point of this quadratic approximation. This critical point coincides with a minimizer of the quadratic approximation in the case where the Hessian of f𝑓fitalic_f at xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is positive semidefinite.

The work required in each iteration of Newton’s method consists of solving a system of linear equations which arises from setting the gradient of the quadratic approximation to zero. This can be carried out in time that grows polynomially with the dimension n𝑛nitalic_n. Perhaps the most well-known theorem about the performance of Newton’s method is its local quadratic convergence. More precisely, under the assumptions that the second derivative of f𝑓fitalic_f is locally Lipschitz around a local minimizer xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, and that the Hessian at xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is positive definite, there exists a full-dimensional basin around xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and a constant c𝑐citalic_c, such that if x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is in this basin, one has

xk+1xcxkx2normsubscript𝑥𝑘1superscript𝑥𝑐superscriptnormsubscript𝑥𝑘superscript𝑥2\|x_{k+1}-x^{*}\|\leq c\|x_{k}-x^{*}\|^{2}∥ italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_c ∥ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

for all k0𝑘0k\geq 0italic_k ≥ 0. We note however that Newton’s method is in general not globally convergent. Lack of global convergence can occur even when in addition to the previous assumptions, f𝑓fitalic_f is assumed to be strongly convex (see, e.g., Example 5.1 in Section 5).

As higher-order Taylor expansions provide closer local approximations to the function f𝑓fitalic_f, it is natural to ask why Newton’s method limits the order of Taylor approximation to 2. The main barrier to higher-order Newton methods is the computational burden associated with minimizing polynomials of degree larger than 2 which would arise from higher-order Taylor expansions. For instance, any of the following tasks that one could consider for each iteration of a higher-order Newton method are in general NP-hard:

  1. (i)

    finding a global minimum of polynomials of degree even111Note that odd-degree polynomials are unbounded below. and at least 4 (see, e.g., [39]),

  2. (ii)

    finding a local minimum of polynomials of degree at least 4 (see [8, Theorem 2.1]),

  3. (iii)

    finding a second-order point (i.e., a point where the gradient vanishes and the Hessian is positive semidefinite) of polynomials of degree at least 4 (see [7, Theorem 2.2]),

  4. (iv)

    finding a critical point (i.e., a point where the gradient vanishes) of polynomials of degree at least 3 (see [7, Theorem 2.1]).

In addition to matters related to computation, there are geometric distinctions between Newton’s method and higher-order analogues of it. For example, even when the function f𝑓fitalic_f is strongly convex and the starting iterate is arbitrarily close to its minimizer, Taylor expansions of even degree and larger than 2 may not be bounded below. One can see this by examining the strongly convex univariate function f(x)=x2x4+x6𝑓𝑥superscript𝑥2superscript𝑥4superscript𝑥6f(x)=x^{2}-x^{4}+x^{6}italic_f ( italic_x ) = italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT + italic_x start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT and its 4th order Taylor expansion near the origin.

Despite these barriers, the question of whether one can make higher-order Newton methods tractable and in some way superior to Newton’s method has been considered at least since the work of Chebyshev [20] (see Section 1.1 for more recent literature). More specifically, the question that is of interest to us is whether it is possible to design a higher-order Newton method (i.e., a method which utilizes a Taylor expansion of degree d>2𝑑2d>2italic_d > 2 in each iteration) in such a way that (i) the work per iteration grows polynomially with the dimension, and (ii) the local order of convergence grows with d𝑑ditalic_d, hence requiring fewer function evaluations as d𝑑ditalic_d increases. In this paper, we show that this is indeed possible (Algorithm 1 and Theorem 4).

Our algorithm relies on sum of squares techniques in optimization [44][30] and semidefinite programming and does not require the function f𝑓fitalic_f to be convex. For any fixed degree d𝑑ditalic_d, our approach is to approximate the d𝑑ditalic_d-th order Taylor expansion of f𝑓fitalic_f with an “sos-convex” polynomial (see Section 2 for a definition). Sos-convex polynomials form a subclass of convex polynomials whose convexity has an explicit algebraic proof. One can then use a first-order sum of squares relaxation to minimize this sos-convex polynomial. It turns out that both the task of finding a suitable sos-convex polynomial and that of minimizing it can be carried out by solving two semidefinite programs whose sizes are polynomial in the dimension n𝑛nitalic_n (in fact of the same order as the number of terms in the Taylor expansion). As is well known, semidefinite programs can be solved to arbitrary accuracy in polynomial time; see [48] and references therein.

We work with sos-convex polynomials instead of general convex polynomials since the latter set lacks a tractable description [5], and the former, as we show, turns out to be sufficient for achieving an algorithm with superlinear local convergence. Our sum of squares based algorithm works for higher-order Newton methods of any order d𝑑ditalic_d and can be easily implemented using any sum of squares parser (e.g., YALMIP [35] or SOSTOOLS [45]). This is in contrast to previous work where implementable algorithms have been worked out only for d=3𝑑3d=3italic_d = 3 ; see [40][23, Sect. 1.5][25, Sect. 5]. While we present our algorithms in the unconstrained case, they can be readily implemented in the presence of sos-convex constraints (such as linear constraints or convex quadratic constraints). We note, however, that our interest in this paper is only on generalizing Newton’s method in terms of its convergence order and polynomial work per iteration, and not on the practical aspects of implementation. Designing more scalable algorithms for semidefinite programs is an active area of research [36, 49]. In addition, we believe that there are promising future research directions which could make our algorithms more practical at larger scale (see Section 7).

1.1 Related Work

Over the years, there have been many adaptations of and extensions to Newton’s method. A primary example is the pioneering work of Nesterov and Polyak [41], where the idea of Newton’s method with cubic regularization was introduced. We do not review the large literature that emerged from this work since the order d𝑑ditalic_d of Taylor expansion in this line of work is still equal to 2, and hence these methods are not considered “higher-order” (i.e., d>2𝑑2d>2italic_d > 2). However, the framework that we propose, similar to most of the literature, follows the structure of [41] (and [33, 37]) in terms of minimizing, in each iteration, a Taylor expansion of a certain order plus an appropriate regularization term. Recently, there has been a body of work following this structure with Taylor expansions of order higher than two [40, 9, 12, 28, 29, 25]. Unlike our paper, these works are in the setting of convex optimization, do not study the complexity of minimizing the regularized Taylor expansion in each iteration (except in the case of d=3𝑑3d=3italic_d = 3 for a subset of these papers), and derive sublinear rates of global convergence. There has also been work on lower bounds on the rates of convergence for such methods [11, 1, 13, 40]. These lower bounds are nearly achieved by the algorithms in the aforementioned papers. The recent textbook [17] provides an accessible summary of this literature and its broader scope. See also [14, 16, 15] and references therein.

In terms of work per iteration of higher-order Newton methods, Nesterov presents a polynomial-time algorithm in [40] for minimizing a quartically-regularized third-order Taylor expansion. This problem is revisited recently in [18], where an algorithm for recovering an approximate second-order point for a possibly nonconvex quartically-regularized third-order Taylor expansion is presented. In [47], a different third-order Newton method is presented which has polynomial work per iteration. In each iteration, this algorithm moves to a local minimum of the third-order Taylor expansion. It turns out that local minima of cubic polynomials can be found by semidefinite programs of polynomial size [7]. To the best of our knowledge, no efficient algorithm for higher-order Newton methods of degree d>3𝑑3d>3italic_d > 3 has been presented. In fact, designing such an algorithm is referred to as an open problem in [23, Sec. 1.5] and [25, Sec. 5]. Interestingly, Nesterov asks in [40, Sec. 6] whether it is possible to tackle this problem using “some tools from algebraic geometry and the related technique of sums of squares”. This is precisely the approach that we take in this paper.

To our knowledge, the only works that establish superlinear rates of local convergence for higher-order Newton methods are [47] and [24] (and the related PhD thesis [23]), the latter of which came to our attention at the time of writing this paper. In [47], the authors establish third-order local convergence rate for an unregularized third-order Newton method applied to a strongly convex function. In [24], the authors establish superlinear local convergence for higher-order Newton methods applied to convex optimization problems with composite objective. When the smooth part of the objective function is strongly convex, the authors show local convergence of order d𝑑ditalic_d in function value and norm of the subgradient for their proposed d𝑑ditalic_dth-order Newton method. An algorithm carrying out the work per iteration of this method, however, is available only in the case of d=3𝑑3d=3italic_d = 3 (and is the same as that in [40]). Moreover, similar to much of the literature, the regularization term that is added to the Taylor expansion in this method requires knowledge of the Lipschitz constant of the d𝑑ditalic_dth derivative of f𝑓fitalic_f. Our proof technique for local superlinear convergence is different than [24] both in the parts where the sum of squares programming aspects come in and in the parts that they do not. Furthermore, our method has polynomial work per iteration for any degree d𝑑ditalic_d. It also does not rely on knowledge of any Lipschitz constants. Our regularization term is instead derived from the optimal value of a semidefinite program which can be written down from the coefficients of the Taylor expansion alone. This optimized approach can potentially lead to smaller deviations from the Taylor expansion and therefore an improved convergence factor. Finally, we note that in our work, assumptions on convexity of f𝑓fitalic_f and knowledge of the Lipschitz constant of its d𝑑ditalic_dth derivative are made only in Section 6, where global convergence is established. Our approach in Section 6 is based on incorporating sum of squares methods into the framework of Nesterov in [40], though in theory this can also be done with other globally convergent higher-order Newton methods. In fact, at the time of revision, there has already been interesting follow-up work to our paper which combines our sum of squares framework with adaptive regularization techniques for tensor methods and analyzes the complexity of the resulting algorithm for finding an approximate stationary point of a nonconvex function [19].

1.2 Organization and Contributions

In Section 2, we review preliminaries on sos-convexity, sos-convex polynomial optimization, and error rates of derivatives of Taylor expansions. In Section 3, we present our main algorithm (Algorithm 1). In Section 4, we prove that our algorithm is well-defined in the sense that the semidefinite programs it executes are always feasible and that the next iterate is always uniquely defined (Theorem 3). We then prove that our semidefinite programming-based d𝑑ditalic_dth-order Newton scheme has local convergence of order d𝑑ditalic_d (Theorem 4). Compared to the classical Newton method, this leads to fewer calls to the Taylor expansion oracle (a common oracle in this literature; see e.g., [12][29, Sect. 2.2][1, Sect. 1.1][11, Sect. 2], [17, Chap. 1.2]) at the price of requiring higher-order derivatives. The proof of Theorem 4 is more involved than the proof of local quadratic convergence of Newton’s method. This is in part because the expression for the next iterate of Newton’s method is explicit, whereas our next iterate comes from the solution to two semidefinite programs. We also remark that our proof framework is applicable to a broader class of higher-order Newton methods that may not necessarily use sum of squares techniques.

In Section 5, we present three numerical examples. We give an explicit expression and a geometric interpretation of our third-order Newton method in dimension one. We compare the basins of attraction of local minima for our higher-order methods to those of the classical Newton method. In Section 6, we present a slightly modified higher-order Newton method which is globally convergent under additional convexity and Lipschitzness assumptions similar to those in [40]. This modified algorithm works in the case of d𝑑ditalic_d being an odd integer and still has polynomial work per iteration and local convergence of order d𝑑ditalic_d. Finally, in Section 7, we present a few directions for future research.

2 Preliminaries

2.1 SOS-Convex Polynomial Optimization

In each iteration of the higher-order Newton methods that we propose, two semidefinite programs (SDPs) need to be solved. These SDPs arise from the notion of sos-convexity, which is reviewed in this subsection.

Definition 1.

A polynomial p:n:𝑝maps-tosuperscript𝑛p:\mathbb{R}^{n}\mapsto\mathbb{R}italic_p : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ↦ blackboard_R is said to be a sum of squares (sos) if there exist polynomials q1,,qr:n:subscript𝑞1subscript𝑞𝑟maps-tosuperscript𝑛q_{1},\dots,q_{r}:\mathbb{R}^{n}\mapsto\mathbb{R}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ↦ blackboard_R such that p=i=1rqi2𝑝superscriptsubscript𝑖1𝑟superscriptsubscript𝑞𝑖2p=\sum_{i=1}^{r}q_{i}^{2}italic_p = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

As is well known, one can check if a polynomial is sos by solving an SDP. The next theorem establishes this link. We denote that a symmetric matrix A𝐴Aitalic_A is positive semidefinite (i.e., has nonnegative eigenvalues) with the standard notation A0succeeds-or-equals𝐴0A\succeq 0italic_A ⪰ 0.

Theorem 1 (see, e.g., [44]).

For a variable xn𝑥superscript𝑛x\in\mathbb{R}^{n}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and an even integer d𝑑ditalic_d, let ϕd2(x)subscriptitalic-ϕ𝑑2𝑥\phi_{\frac{d}{2}}(x)italic_ϕ start_POSTSUBSCRIPT divide start_ARG italic_d end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT ( italic_x ) denote the vector of all monomials of degree at most d2𝑑2\frac{d}{2}divide start_ARG italic_d end_ARG start_ARG 2 end_ARG in x𝑥xitalic_x. A polynomial p:n:𝑝maps-tosuperscript𝑛p:\mathbb{R}^{n}\mapsto\mathbb{R}italic_p : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ↦ blackboard_R of degree d𝑑ditalic_d is sos if and only if there exists a symmetric matrix Q𝑄Qitalic_Q such that (i) p(x)=ϕd2(x)TQϕd2(x)𝑝𝑥subscriptitalic-ϕ𝑑2superscript𝑥𝑇𝑄subscriptitalic-ϕ𝑑2𝑥p(x)=\phi_{\frac{d}{2}}(x)^{T}Q\phi_{\frac{d}{2}}(x)italic_p ( italic_x ) = italic_ϕ start_POSTSUBSCRIPT divide start_ARG italic_d end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q italic_ϕ start_POSTSUBSCRIPT divide start_ARG italic_d end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPT ( italic_x ) for all xn𝑥superscript𝑛x\in\mathbb{R}^{n}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and, (ii) Q0succeeds-or-equals𝑄0Q\succeq 0italic_Q ⪰ 0.

The first constraint above can be written as a finite number of linear equations by coefficient matching. Therefore, the two constraints together represent the intersection of an affine subspace with the cone of positive semidefinite matrices. Thus, as polynomials can be encoded as an ordered vector of coefficients, the set of sos polynomials of a given degree has a description as the feasible region of a semidefinite program. Furthermore, the size of this SDP grows polynomially in n𝑛nitalic_n when d𝑑ditalic_d is fixed.

Throughout this paper, we denote the gradient vector (resp. Hessian matrix) of a function g:n:𝑔maps-tosuperscript𝑛g:\mathbb{R}^{n}\mapsto\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ↦ blackboard_R with the standard notation g𝑔\nabla g∇ italic_g (resp. 2gsuperscript2𝑔\nabla^{2}g∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g).

Definition 2 (SOS-Convex).

A polynomial p:n:𝑝maps-tosuperscript𝑛p:\mathbb{R}^{n}\mapsto\mathbb{R}italic_p : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ↦ blackboard_R is said to be sos-convex if the polynomial q:n×n:𝑞maps-tosuperscript𝑛superscript𝑛q:\mathbb{R}^{n}\times\mathbb{R}^{n}\mapsto\mathbb{R}italic_q : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ↦ blackboard_R defined as q(x,y):=yT2p(x)yq(x,y)\mathrel{\mathop{:}}=y^{T}\nabla^{2}p(x)yitalic_q ( italic_x , italic_y ) : = italic_y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p ( italic_x ) italic_y is sos.

Note that any sos-convex polynomial is convex. The converse statement is not true, except for certain dimensions and degrees (see [6]). By Theorem 1 above, the set of sos-convex polynomials of a given degree also form the feasible region of a semidefinite program. Because the polynomial q(x,y)𝑞𝑥𝑦q(x,y)italic_q ( italic_x , italic_y ) is quadratic in y𝑦yitalic_y, one can reduce the size of the underlying SDP. More specifically, a polynomial p:n:𝑝maps-tosuperscript𝑛p:\mathbb{R}^{n}\mapsto\mathbb{R}italic_p : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ↦ blackboard_R of degree d𝑑ditalic_d is sos-convex222Note that an odd-degree polynomial can never be convex, except for the trivial case of affine polynomials. if and only if there exists a symmetric matrix Q0succeeds-or-equals𝑄0Q\succeq 0italic_Q ⪰ 0 such that yT2p(x)y=(ϕd21(x)y)TQ(ϕd21(x)y)superscript𝑦𝑇superscript2𝑝𝑥𝑦superscripttensor-productsubscriptitalic-ϕ𝑑21𝑥𝑦𝑇𝑄tensor-productsubscriptitalic-ϕ𝑑21𝑥𝑦y^{T}\nabla^{2}p(x)y=(\phi_{\frac{d}{2}-1}(x)\otimes y)^{T}Q(\phi_{\frac{d}{2}% -1}(x)\otimes y)italic_y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p ( italic_x ) italic_y = ( italic_ϕ start_POSTSUBSCRIPT divide start_ARG italic_d end_ARG start_ARG 2 end_ARG - 1 end_POSTSUBSCRIPT ( italic_x ) ⊗ italic_y ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q ( italic_ϕ start_POSTSUBSCRIPT divide start_ARG italic_d end_ARG start_ARG 2 end_ARG - 1 end_POSTSUBSCRIPT ( italic_x ) ⊗ italic_y ). (Here, tensor-product\otimes denotes the Kronecker product.) We see that the size of the SDP that represents sos-convex polynomials of degree d𝑑ditalic_d in n𝑛nitalic_n variables grows polynomially in n𝑛nitalic_n when d𝑑ditalic_d is fixed.

We next explain why sos-convex polynomial optimization problems can be solved with the first level of the so-called Lasserre hierarchy. A polynomial optimization problem is a problem of the form

infxnsubscriptinfimum𝑥superscript𝑛\displaystyle\inf_{x\in\mathbb{R}^{n}}roman_inf start_POSTSUBSCRIPT italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT g0(x)subscript𝑔0𝑥\displaystyle g_{0}(x)italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) (1)
s.t. gj(x)0j=1,,m,formulae-sequencesubscript𝑔𝑗𝑥0𝑗1𝑚\displaystyle g_{j}(x)\leq 0\quad j=1,\ldots,m,italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ≤ 0 italic_j = 1 , … , italic_m ,

where gj(x)subscript𝑔𝑗𝑥g_{j}(x)italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) are real-valued polynomial functions of a variable xn𝑥superscript𝑛x\in\mathbb{R}^{n}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. The first-level Lasserre relaxation (see [30]) corresponding to problem (1) takes the form

supγ,λmsubscriptsupremumformulae-sequence𝛾𝜆superscript𝑚\displaystyle\sup_{\gamma\in\mathbb{R},\lambda\in\mathbb{R}^{m}}roman_sup start_POSTSUBSCRIPT italic_γ ∈ blackboard_R , italic_λ ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT γ𝛾\displaystyle\gammaitalic_γ (2)
s.t. g0(x)γ+j=1mλjgj(x) is sossubscript𝑔0𝑥𝛾superscriptsubscript𝑗1𝑚subscript𝜆𝑗subscript𝑔𝑗𝑥 is sos\displaystyle g_{0}(x)-\gamma+\sum_{j=1}^{m}\lambda_{j}g_{j}(x)\text{ is sos}italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) - italic_γ + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) is sos
λj0j=1,,m.formulae-sequencesubscript𝜆𝑗0𝑗1𝑚\displaystyle\lambda_{j}\geq 0\quad j=1,\ldots,m.italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≥ 0 italic_j = 1 , … , italic_m .

The reader can check that the optimal value of (2) is always a lower bound on that of (1). The next theorem establishes that this lower-bound is tight when the defining polynomials of (1) are sos-convex.

Theorem 2 (See Corollary 2.5 from [31], and Theorem 3.3 from [32]).

Suppose that the polynomials g0,,gmsubscript𝑔0subscript𝑔𝑚g_{0},\dots,g_{m}italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_g start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT in (1) are sos-convex, the optimal value of (1) is finite, and that the Slater condition holds333That is, there exists some x¯n¯𝑥superscript𝑛\bar{x}\in\mathbb{R}^{n}over¯ start_ARG italic_x end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT such that gj(x¯)<0subscript𝑔𝑗¯𝑥0g_{j}(\bar{x})<0italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over¯ start_ARG italic_x end_ARG ) < 0 for all j=1,,m𝑗1𝑚j=1,\ldots,mitalic_j = 1 , … , italic_m.. Then, the optimal values of (1) and (2) are the same. Moreover, an optimal solution to (1) can be readily recovered from a solution to the semidefinite program that is dual to (2).

This result is already proven by Lasserre in [32] using a lemma of Helton and Nie from [26]. For completeness and for the benefit of the reader, we give an alternative short proof of the first claim.

Proof.

Recalling that an sos polynomial is nonnegative and that λj0subscript𝜆𝑗0\lambda_{j}\geq 0italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≥ 0 for j=1,,m𝑗1𝑚j=1,\ldots,mitalic_j = 1 , … , italic_m, it is easy to see that the optimal value of (1) is larger than or equal to the optimal value of (2). To show the opposite inequality, let γsuperscript𝛾\gamma^{*}italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be the optimal value of (1). Then, the convex function xg0(x)γmaps-to𝑥subscript𝑔0𝑥superscript𝛾x\mapsto g_{0}(x)-\gamma^{*}italic_x ↦ italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is nonnegative over the set {xgj(x)0,j=1,,m}conditional-set𝑥formulae-sequencesubscript𝑔𝑗𝑥0𝑗1𝑚\{x\mid g_{j}(x)\leq 0,j=1,\ldots,m\}{ italic_x ∣ italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ≤ 0 , italic_j = 1 , … , italic_m }. By the convex Farkas lemma (see, e.g., [46, Theorem 2.1]), there exists a nonnegative vector λmsuperscript𝜆superscript𝑚\lambda^{*}\in\mathbb{R}^{m}italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT such that p(x):=g0(x)γ+j=1mλjgj(x)0p(x)\mathrel{\mathop{:}}=g_{0}(x)-\gamma^{*}+\sum_{j=1}^{m}\lambda^{*}_{j}g_{j% }(x)\geq 0italic_p ( italic_x ) : = italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) - italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) ≥ 0 for all xn𝑥superscript𝑛x\in\mathbb{R}^{n}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Notice that p(x)𝑝𝑥p(x)italic_p ( italic_x ) is sos-convex since it is a conic combination of sos-convex polynomials. Thus, by [6, Theorem 3.1], the polynomial q(x,y):=p(y)p(x)p(x)T(yx)q(x,y)\mathrel{\mathop{:}}=p(y)-p(x)-\nabla p(x)^{T}(y-x)italic_q ( italic_x , italic_y ) : = italic_p ( italic_y ) - italic_p ( italic_x ) - ∇ italic_p ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_y - italic_x ) is sos. Let xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be an optimal solution to (1) (such a vector must exist [10]). Observe that the polynomial yq(x,y)maps-to𝑦𝑞superscript𝑥𝑦y\mapsto q(x^{*},y)italic_y ↦ italic_q ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y ) is also sos (since it is the restriction of q(x,y)𝑞𝑥𝑦q(x,y)italic_q ( italic_x , italic_y ) to x=x𝑥superscript𝑥x=x^{*}italic_x = italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT). By optimality of xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT to (1), we have p(x)0𝑝superscript𝑥0p(x^{*})\leq 0italic_p ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ 0. Since p𝑝pitalic_p is nonnegative, we have p(x)=0𝑝superscript𝑥0p(x^{*})=0italic_p ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = 0 and p(x)=0𝑝superscript𝑥0\nabla p(x^{*})=0∇ italic_p ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = 0. Thus, p(y)=q(x,y)𝑝𝑦𝑞superscript𝑥𝑦p(y)=q(x^{*},y)italic_p ( italic_y ) = italic_q ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y ), and hence p(y)𝑝𝑦p(y)italic_p ( italic_y ) must be sos. Therefore, γ,λsuperscript𝛾superscript𝜆\gamma^{*},\lambda^{*}italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_λ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is feasible to (2), and hence the optimal value of (2) is at least γsuperscript𝛾\gamma^{*}italic_γ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT; i.e., the optimal value of (1).

For a proof of the second claim and an explicit expression of the dual of (2), see Theorem 3.3 from [32].

2.2 Error rates of Taylor remainders

In this subsection, we review certain error rates of multivariate Taylor expansions that will be used in our arguments. We denote by dfsuperscript𝑑𝑓\nabla^{d}f∇ start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_f the d𝑑ditalic_dth order symmetric tensor of order-d𝑑ditalic_d partial derivatives of the function f𝑓fitalic_f. We denote the tensor product of a set of vectors x1,,xdnsubscript𝑥1subscript𝑥𝑑superscript𝑛x_{1},\ldots,x_{d}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with x1x2xdsubscript𝑥1subscript𝑥2subscript𝑥𝑑x_{1}\boxtimes x_{2}\boxtimes\ldots\boxtimes x_{d}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊠ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊠ … ⊠ italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT.444We use this slightly nonstandard notation to avoid confusion with the Kronecker product. We use the notation xdsuperscript𝑥absent𝑑x^{\boxtimes d}italic_x start_POSTSUPERSCRIPT ⊠ italic_d end_POSTSUPERSCRIPT to denote the tensor product of a vector xn𝑥superscript𝑛x\in\mathbb{R}^{n}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with itself d𝑑ditalic_d times. With this notation, we can define the d𝑑ditalic_dth-order Taylor expansion of a d𝑑ditalic_d-times differentiable function f𝑓fitalic_f at a point x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG as

Tx¯,d(x):=f(x¯)+i=1d1i!if(x¯),(xx¯)i,T_{\bar{x},d}(x)\mathrel{\mathop{:}}=f(\bar{x})+\sum_{i=1}^{d}\frac{1}{i!}% \langle\nabla^{i}f(\bar{x}),(x-\bar{x})^{\boxtimes i}\rangle,italic_T start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG , italic_d end_POSTSUBSCRIPT ( italic_x ) : = italic_f ( over¯ start_ARG italic_x end_ARG ) + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_i ! end_ARG ⟨ ∇ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_f ( over¯ start_ARG italic_x end_ARG ) , ( italic_x - over¯ start_ARG italic_x end_ARG ) start_POSTSUPERSCRIPT ⊠ italic_i end_POSTSUPERSCRIPT ⟩ ,

where ,\langle\cdot,\cdot\rangle⟨ ⋅ , ⋅ ⟩ denotes the standard tensor inner product. The remainder or error term of the Taylor expansion is

Rx¯,d(x):=f(x)Tx¯,d(x).R_{\bar{x},d}(x)\mathrel{\mathop{:}}=f(x)-T_{\bar{x},d}(x).italic_R start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG , italic_d end_POSTSUBSCRIPT ( italic_x ) : = italic_f ( italic_x ) - italic_T start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG , italic_d end_POSTSUBSCRIPT ( italic_x ) .

For a d𝑑ditalic_dth-order tensor D𝐷Ditalic_D, let us define the following norm

D:=maxx1,,xd1D,x1x2xd,\|D\|\mathrel{\mathop{:}}=\underset{\|x_{1}\|,\ldots,\|x_{d}\|\leq 1}{\max}% \langle D,x_{1}\boxtimes x_{2}\boxtimes\ldots\boxtimes x_{d}\rangle,∥ italic_D ∥ : = start_UNDERACCENT ∥ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ , … , ∥ italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∥ ≤ 1 end_UNDERACCENT start_ARG roman_max end_ARG ⟨ italic_D , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊠ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊠ … ⊠ italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ⟩ ,

where xinormsubscript𝑥𝑖||x_{i}||| | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | denotes the Euclidean 2-norm of the vector xinsubscript𝑥𝑖superscript𝑛x_{i}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Note that for cases of d=1𝑑1d=1italic_d = 1 and d=2𝑑2d=2italic_d = 2, this expression reduces to the standard Euclidean norm and the spectral norm, respectively.

We will need the following lemma in Section 4.

Lemma 1 (see, e.g., inequality (11) in [9]).

Fix a vector x¯n¯𝑥superscript𝑛\bar{x}\in\mathbb{R}^{n}over¯ start_ARG italic_x end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Suppose dfsuperscript𝑑𝑓\nabla^{d}f∇ start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_f has a Lipschitz constant L𝐿Litalic_L over a convex set C𝐶Citalic_C containing x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG, i.e.,

df(x)df(y)Lxynormsuperscript𝑑𝑓𝑥superscript𝑑𝑓𝑦𝐿norm𝑥𝑦\|\nabla^{d}f(x)-\nabla^{d}f(y)\|\leq L\|x-y\|∥ ∇ start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_f ( italic_x ) - ∇ start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_f ( italic_y ) ∥ ≤ italic_L ∥ italic_x - italic_y ∥

for all x,yC𝑥𝑦𝐶x,y\in Citalic_x , italic_y ∈ italic_C. Then, for any xC𝑥𝐶x\in Citalic_x ∈ italic_C, we have

Rx¯,d(x)Ld!xx¯d.normsubscript𝑅¯𝑥𝑑𝑥𝐿𝑑superscriptnorm𝑥¯𝑥𝑑\|\nabla R_{\bar{x},d}(x)\|\leq\frac{L}{d!}\|x-\bar{x}\|^{d}.∥ ∇ italic_R start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG , italic_d end_POSTSUBSCRIPT ( italic_x ) ∥ ≤ divide start_ARG italic_L end_ARG start_ARG italic_d ! end_ARG ∥ italic_x - over¯ start_ARG italic_x end_ARG ∥ start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .

and

2Rx¯,d(x)L(d1)!xx¯d1.normsuperscript2subscript𝑅¯𝑥𝑑𝑥𝐿𝑑1superscriptnorm𝑥¯𝑥𝑑1\|\nabla^{2}R_{\bar{x},d}(x)\|\leq\frac{L}{(d-1)!}\|x-\bar{x}\|^{d-1}.∥ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG , italic_d end_POSTSUBSCRIPT ( italic_x ) ∥ ≤ divide start_ARG italic_L end_ARG start_ARG ( italic_d - 1 ) ! end_ARG ∥ italic_x - over¯ start_ARG italic_x end_ARG ∥ start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT .

3 Algorithm Definition

For a given integer d3𝑑3d\geq 3italic_d ≥ 3, we consider the task of minimizing a function f𝑓fitalic_f which is assumed to have derivatives up to order d𝑑ditalic_d, and a local minimum xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT satisfying 2f(x)0succeedssuperscript2𝑓superscript𝑥0\nabla^{2}f(x^{*})\succ 0∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≻ 0. We also assume that the d𝑑ditalic_dth derivative of f𝑓fitalic_f is locally Lipschitz around the point xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, i.e., there is a radius rL>0subscript𝑟𝐿0r_{L}>0italic_r start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT > 0, and a scalar L0𝐿0L\geq 0italic_L ≥ 0, such that for points x,y𝑥𝑦x,yitalic_x , italic_y in the set {znzxrL}conditional-set𝑧superscript𝑛norm𝑧superscript𝑥subscript𝑟𝐿\{z\in\mathbb{R}^{n}\mid\|z-x^{*}\|\leq r_{L}\}{ italic_z ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∣ ∥ italic_z - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_r start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT }, we have

df(x)df(y)Lxy.normsuperscript𝑑𝑓𝑥superscript𝑑𝑓𝑦𝐿norm𝑥𝑦\|\nabla^{d}f(x)-\nabla^{d}f(y)\|\leq L\|x-y\|.∥ ∇ start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_f ( italic_x ) - ∇ start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_f ( italic_y ) ∥ ≤ italic_L ∥ italic_x - italic_y ∥ .

Note that the latter assumption is always satisfied if the d+1𝑑1d+1italic_d + 1th derivative of f𝑓fitalic_f exists and is continuous. Our goal is to minimize f𝑓fitalic_f by iteratively minimizing a surrogate function of the type

Txk,d(x)+txxkd,subscript𝑇subscript𝑥𝑘𝑑𝑥𝑡superscriptnorm𝑥subscript𝑥𝑘superscript𝑑T_{x_{k},d}(x)+t||x-x_{k}||^{d^{\prime}},italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x ) + italic_t | | italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ,

where xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is our current iterate, Txk,dsubscript𝑇subscript𝑥𝑘𝑑T_{x_{k},d}italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT is the Taylor expansion of f𝑓fitalic_f of order d𝑑ditalic_d at xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, dsuperscript𝑑d^{\prime}italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the smallest even integer greater than d𝑑ditalic_d (as we require the surrogate to be a polynomial), and t𝑡titalic_t is chosen according to the following sum of squares program:

mintsubscript𝑡\displaystyle\min_{t\in\mathbb{R}}roman_min start_POSTSUBSCRIPT italic_t ∈ blackboard_R end_POSTSUBSCRIPT t𝑡\displaystyle titalic_t (3)
s.t. Txk,d(x)+txxkdsos-convexsubscript𝑇subscript𝑥𝑘𝑑𝑥𝑡superscriptnorm𝑥subscript𝑥𝑘superscript𝑑sos-convex\displaystyle T_{x_{k},d}(x)+t||x-x_{k}||^{d^{\prime}}\quad\text{sos-convex}italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x ) + italic_t | | italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT sos-convex
t0.𝑡0\displaystyle t\geq 0.italic_t ≥ 0 .

In view of Theorem 1 and the remarks after Definition 2, this program can be reformulated as an SDP of size polynomial in n𝑛nitalic_n. Letting t(xk)𝑡subscript𝑥𝑘t(x_{k})italic_t ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) denote the optimal value of (3) for a given xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, we define our surrogate function to be

ψxk,d(x):=Txk,d(x)+t(xk)||xxk||d.\psi_{x_{k},d}(x)\mathrel{\mathop{:}}=T_{x_{k},d}(x)+t(x_{k})||x-x_{k}||^{d^{% \prime}}.italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x ) : = italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x ) + italic_t ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) | | italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT . (4)

In our algorithm, we choose xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT to be the minimizer of ψxk,dsubscript𝜓subscript𝑥𝑘𝑑\psi_{x_{k},d}italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT (which exists and is unique; see Theorem 3 below). By Theorem 2, since ψxk,dsubscript𝜓subscript𝑥𝑘𝑑\psi_{x_{k},d}italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT is sos-convex, we can find its minimizer via another SDP of size polynomial in n𝑛nitalic_n.

If xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is far from xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT so that 2f(xk)superscript2𝑓subscript𝑥𝑘\nabla^{2}f(x_{k})∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) is not positive definite, it may occur that (3) is infeasible. If this occurs, we fix a positive scalar555Our analysis applies to any positive value of ε𝜀\varepsilonitalic_ε. ε𝜀\varepsilonitalic_ε and instead solve the SDP:

mint¯subscript¯𝑡\displaystyle\min_{\bar{t}\in\mathbb{R}}roman_min start_POSTSUBSCRIPT over¯ start_ARG italic_t end_ARG ∈ blackboard_R end_POSTSUBSCRIPT t¯¯𝑡\displaystyle\bar{t}over¯ start_ARG italic_t end_ARG (5)
s.t. Txk,d(x)+12(ελmin2f(xk))xxk2+t¯xxkdsos-convexsubscript𝑇subscript𝑥𝑘𝑑𝑥12𝜀subscript𝜆superscript2𝑓subscript𝑥𝑘superscriptnorm𝑥subscript𝑥𝑘2¯𝑡superscriptnorm𝑥subscript𝑥𝑘superscript𝑑sos-convex\displaystyle T_{x_{k},d}(x)+\frac{1}{2}\bigg{(}\varepsilon-\lambda_{\min}% \nabla^{2}f(x_{k})\bigg{)}||x-x_{k}||^{2}+\bar{t}||x-x_{k}||^{d^{\prime}}\quad% \text{sos-convex}italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_ε - italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) | | italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + over¯ start_ARG italic_t end_ARG | | italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT sos-convex
t0.𝑡0\displaystyle t\geq 0.italic_t ≥ 0 .

Let t¯(xk)¯𝑡subscript𝑥𝑘\bar{t}(x_{k})over¯ start_ARG italic_t end_ARG ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) denote the optimal value of (5) and define

ψ¯xk,d(x):=Txk,d(x)+12(ελminf(xk))||xxk||2+t¯(xk)||xxk||d.\bar{\psi}_{x_{k},d}(x)\mathrel{\mathop{:}}=T_{x_{k},d}(x)+\frac{1}{2}\bigg{(}% \varepsilon-\lambda_{\min}\nabla f(x_{k})\bigg{)}||x-x_{k}||^{2}+\bar{t}(x_{k}% )||x-x_{k}||^{d^{\prime}}.over¯ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x ) : = italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_ε - italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) | | italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + over¯ start_ARG italic_t end_ARG ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) | | italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT . (6)

We then let xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT be the minimizer ψ¯xk,dsubscript¯𝜓subscript𝑥𝑘𝑑\bar{\psi}_{x_{k},d}over¯ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT (which again exists and is unique; see Theorem 3 below). As before, we can find a minimizer of ψ¯xk,dsubscript¯𝜓subscript𝑥𝑘𝑑\bar{\psi}_{x_{k},d}over¯ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT by solving an SDP of size polynomial in n𝑛nitalic_n; see Theorem 2.

Our overall algorithm is summarized below:

Parameter: ε>0𝜀0\varepsilon>0italic_ε > 0
Input: x0nsubscript𝑥0superscript𝑛x_{0}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT
1 for k=0,𝑘0italic-…k=0,\dotsitalic_k = 0 , italic_… do
2       if 2f(xk)0succeedssuperscript2𝑓subscript𝑥𝑘0\nabla^{2}f(x_{k})\succ 0∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≻ 0 then
3             Solve (3) to find t(xk)𝑡subscript𝑥𝑘t(x_{k})italic_t ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
4             Let xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT be the minimizer of ψxk,dsubscript𝜓subscript𝑥𝑘𝑑\psi_{x_{k},d}italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT (see (4))
5            
6      else
7             Solve (5) to find t¯(xk)¯𝑡subscript𝑥𝑘\bar{t}(x_{k})over¯ start_ARG italic_t end_ARG ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
8             Let xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT be the minimizer of ψ¯xk,dsubscript¯𝜓subscript𝑥𝑘𝑑\bar{\psi}_{x_{k},d}over¯ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT (see (6))
9            
10       end if
11      
12 end for
Algorithm 1 d𝑑ditalic_dth-order Newton method

4 Algorithm Analysis and Convergence

In this section, we present our main technical results. Theorem 3 shows that our algorithm is well-defined for all initial conditions. Theorem 4 gives our convergence result. We remind the reader that the assumptions made on f𝑓fitalic_f are described in the first paragraph of Section 3. In particular, the function f𝑓fitalic_f is not required to be convex, and the d𝑑ditalic_dth derivatives of f𝑓fitalic_f are not required to be globally Lipschitz.

Theorem 3.

Algorithm 1 is well-defined in the sense that

  1. (i)

    the problems (3) and (5) are always feasible when required at Lines 1 and 1, and

  2. (ii)

    the functions ψxk,dsubscript𝜓subscript𝑥𝑘𝑑\psi_{x_{k},d}italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT and ψ¯xk,dsubscript¯𝜓subscript𝑥𝑘𝑑\bar{\psi}_{x_{k},d}over¯ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT (see (4) and (6)) always possess a unique minimizer when required at Lines 1 and 1.

Theorem 4.

There exist constants r,c>0𝑟𝑐0r,c>0italic_r , italic_c > 0 such that if x0xrnormsubscript𝑥0superscript𝑥𝑟||x_{0}-x^{*}||\leq r| | italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | ≤ italic_r, then the sequence {xk}subscript𝑥𝑘\{x_{k}\}{ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } generated by Algorithm 1 satisfies

xk+1xcxkxdnormsubscript𝑥𝑘1superscript𝑥𝑐superscriptnormsubscript𝑥𝑘superscript𝑥𝑑||x_{k+1}-x^{*}||\leq c||x_{k}-x^{*}||^{d}| | italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | ≤ italic_c | | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT

for all k𝑘kitalic_k.

The power d𝑑ditalic_d in this theorem is referred to as the order of convergence and the constant c𝑐citalic_c is referred to as the factor of convergence. We note that the factor of convergence arising from our proof is explicit.

To prove Theorems 3 and 4, we first establish some technical lemmas. Lemmas 2 and 3 are used to prove the first claim of Theorem 3; Lemmas 4 and 5 are for the second claim; and Lemmas 3, 4, and 6 are employed in the proof of Theorem 4.

In Lemma 2, we show that a particular polynomial is in the interior of the cone of sos-convex polynomials. This is used in Lemma 3 to show that we can always make our surrogate functions defined in (3) and (5) sos-convex.

Lemma 2.

Let x:=(x1,,xn)x\mathrel{\mathop{:}}=(x_{1},\ldots,x_{n})italic_x : = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). The polynomial

p(x)=xTx+(xTx)d𝑝𝑥superscript𝑥𝑇𝑥superscriptsuperscript𝑥𝑇𝑥𝑑p(x)=x^{T}x+(x^{T}x)^{d}italic_p ( italic_x ) = italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x + ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT

is in the interior of the cone of sos-convex polynomials in n𝑛nitalic_n variables and of degree at most 2d2𝑑2d2 italic_d.

Proof.

We first establish the following claim:

Claim 0. For all d0𝑑0d\geq 0italic_d ≥ 0, the polynomial

p~d(x)=1+(d+1)(xTx)dsubscript~𝑝𝑑𝑥1𝑑1superscriptsuperscript𝑥𝑇𝑥𝑑\tilde{p}_{d}(x)=1+(d+1)(x^{T}x)^{d}over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x ) = 1 + ( italic_d + 1 ) ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT

can be written as ϕd(x)TQϕd(x)subscriptitalic-ϕ𝑑superscript𝑥𝑇𝑄subscriptitalic-ϕ𝑑𝑥\phi_{d}(x)^{T}Q\phi_{d}(x)italic_ϕ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q italic_ϕ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x ), where ϕdsubscriptitalic-ϕ𝑑\phi_{d}italic_ϕ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT is the standard basis of monomials of degree up to d𝑑ditalic_d with the monomials appearing in ascending order of degree, and Q𝑄Qitalic_Q is a positive definite matrix.

To prove Claim 0, it suffices to show that for all d0𝑑0d\geq 0italic_d ≥ 0, there exists a constant αd>0subscript𝛼𝑑0\alpha_{d}>0italic_α start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT > 0 and a positive definite matrix Q^dsubscript^𝑄𝑑\hat{Q}_{d}over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT such that 1+αd(xTx)d=ϕd(x)TQ^dϕd(x)1subscript𝛼𝑑superscriptsuperscript𝑥𝑇𝑥𝑑subscriptitalic-ϕ𝑑superscript𝑥𝑇subscript^𝑄𝑑subscriptitalic-ϕ𝑑𝑥1+\alpha_{d}(x^{T}x)^{d}=\phi_{d}(x)^{T}\hat{Q}_{d}\phi_{d}(x)1 + italic_α start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x ). Indeed, if αd<d+1subscript𝛼𝑑𝑑1\alpha_{d}<d+1italic_α start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT < italic_d + 1, we can observe that

p~d(x)=1+αd(xTx)d+((d+1)αd)(xTx)d=ϕd(x)T(Q^d+Q)ϕd(x),subscript~𝑝𝑑𝑥1subscript𝛼𝑑superscriptsuperscript𝑥𝑇𝑥𝑑𝑑1subscript𝛼𝑑superscriptsuperscript𝑥𝑇𝑥𝑑subscriptitalic-ϕ𝑑superscript𝑥𝑇subscript^𝑄𝑑superscript𝑄subscriptitalic-ϕ𝑑𝑥\tilde{p}_{d}(x)=1+\alpha_{d}(x^{T}x)^{d}+\left((d+1)-\alpha_{d}\right)(x^{T}x% )^{d}=\phi_{d}(x)^{T}\left(\hat{Q}_{d}+Q^{\prime}\right)\phi_{d}(x),over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x ) = 1 + italic_α start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT + ( ( italic_d + 1 ) - italic_α start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_ϕ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x ) ,

where Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT can be taken to be positive semidefinite since ((d+1)αd)(xTx)d𝑑1subscript𝛼𝑑superscriptsuperscript𝑥𝑇𝑥𝑑((d+1)-\alpha_{d})(x^{T}x)^{d}( ( italic_d + 1 ) - italic_α start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is sos. If αd>d+1subscript𝛼𝑑𝑑1\alpha_{d}>d+1italic_α start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT > italic_d + 1, we can observe that

p~d(x)=d+1αd+(d+1)(xTx)d+(1d+1αd)=ϕd(x)T(d+1αdQ^d+Q)ϕd(x),subscript~𝑝𝑑𝑥𝑑1subscript𝛼𝑑𝑑1superscriptsuperscript𝑥𝑇𝑥𝑑1𝑑1subscript𝛼𝑑subscriptitalic-ϕ𝑑superscript𝑥𝑇𝑑1subscript𝛼𝑑subscript^𝑄𝑑superscript𝑄subscriptitalic-ϕ𝑑𝑥\tilde{p}_{d}(x)=\frac{d+1}{\alpha_{d}}+(d+1)(x^{T}x)^{d}+(1-\frac{d+1}{{% \alpha_{d}}})=\phi_{d}(x)^{T}\left(\frac{d+1}{\alpha_{d}}\hat{Q}_{d}+Q^{\prime% }\right)\phi_{d}(x),over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG italic_d + 1 end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG + ( italic_d + 1 ) ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT + ( 1 - divide start_ARG italic_d + 1 end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG ) = italic_ϕ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( divide start_ARG italic_d + 1 end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_ϕ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_x ) ,

where Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT can be taken to be positive semidefinite since (1d+1αd)1𝑑1subscript𝛼𝑑(1-\frac{d+1}{{\alpha_{d}}})( 1 - divide start_ARG italic_d + 1 end_ARG start_ARG italic_α start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG ) is sos.

Let us now proceed by induction on d𝑑ditalic_d to prove the claim made in the previous paragraph. The case of d=0𝑑0d=0italic_d = 0 is clear since we can take any α0>0subscript𝛼00\alpha_{0}>0italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0 and the associated matrix Q^0subscript^𝑄0\hat{Q}_{0}over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is simply a 1×1111\times 11 × 1 matrix containing the scalar 1+α01subscript𝛼01+\alpha_{0}1 + italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Now suppose that the induction hypothesis holds for d=k𝑑𝑘d=kitalic_d = italic_k. To construct αk+1subscript𝛼𝑘1\alpha_{k+1}italic_α start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT and Q^k+1subscript^𝑄𝑘1\hat{Q}_{k+1}over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, we will add matrices associated with the polynomials 1+αk(xTx)k1subscript𝛼𝑘superscriptsuperscript𝑥𝑇𝑥𝑘1+\alpha_{k}(x^{T}x)^{k}1 + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and α(xTx)k+1αk(xTx)k𝛼superscriptsuperscript𝑥𝑇𝑥𝑘1subscript𝛼𝑘superscriptsuperscript𝑥𝑇𝑥𝑘\alpha(x^{T}x)^{k+1}-\alpha_{k}(x^{T}x)^{k}italic_α ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, where α𝛼\alphaitalic_α is an arbitrary scalar. From the induction hypothesis, there exist a scalar αk>0subscript𝛼𝑘0\alpha_{k}>0italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0 and a matrix Q^k0succeedssubscript^𝑄𝑘0\hat{Q}_{k}\succ 0over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≻ 0 of size (n+kk)×(n+kk)binomial𝑛𝑘𝑘binomial𝑛𝑘𝑘\binom{n+k}{k}\times\binom{n+k}{k}( FRACOP start_ARG italic_n + italic_k end_ARG start_ARG italic_k end_ARG ) × ( FRACOP start_ARG italic_n + italic_k end_ARG start_ARG italic_k end_ARG ) that satisfy

1+αk(xTx)k=ϕk(x)TQ^kϕk(x)=ϕk+1(x)T[Q^k000]ϕk+1(x).1subscript𝛼𝑘superscriptsuperscript𝑥𝑇𝑥𝑘subscriptitalic-ϕ𝑘superscript𝑥𝑇subscript^𝑄𝑘subscriptitalic-ϕ𝑘𝑥subscriptitalic-ϕ𝑘1superscript𝑥𝑇delimited-[]matrixsubscript^𝑄𝑘000subscriptitalic-ϕ𝑘1𝑥1+\alpha_{k}(x^{T}x)^{k}=\phi_{k}(x)^{T}\hat{Q}_{k}\phi_{k}(x)=\phi_{k+1}(x)^{% T}\left[\begin{matrix}\hat{Q}_{k}&0\\ 0&0\end{matrix}\right]\phi_{k+1}(x).1 + italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) = italic_ϕ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] italic_ϕ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x ) .

Meanwhile, observe that we can write

α(xTx)k+1αk(xTx)k=ϕk+1(x)T[0AATαP]ϕk+1(x)𝛼superscriptsuperscript𝑥𝑇𝑥𝑘1subscript𝛼𝑘superscriptsuperscript𝑥𝑇𝑥𝑘subscriptitalic-ϕ𝑘1superscript𝑥𝑇delimited-[]matrix0𝐴superscript𝐴𝑇𝛼𝑃subscriptitalic-ϕ𝑘1𝑥\alpha(x^{T}x)^{k+1}-\alpha_{k}(x^{T}x)^{k}=\phi_{k+1}(x)^{T}\left[\begin{% matrix}0&A\\ A^{T}&\alpha P\end{matrix}\right]\phi_{k+1}(x)italic_α ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_A end_CELL end_ROW start_ROW start_CELL italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL italic_α italic_P end_CELL end_ROW end_ARG ] italic_ϕ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x )

for some matrices A𝐴Aitalic_A and P0succeeds𝑃0P\succ 0italic_P ≻ 0, where the zero block is of size (n+kk)×(n+kk)binomial𝑛𝑘𝑘binomial𝑛𝑘𝑘\binom{n+k}{k}\times\binom{n+k}{k}( FRACOP start_ARG italic_n + italic_k end_ARG start_ARG italic_k end_ARG ) × ( FRACOP start_ARG italic_n + italic_k end_ARG start_ARG italic_k end_ARG ). Indeed, we can take the matrix P𝑃Pitalic_P to be diagonal with its diagonal entries equalling the coefficients of (xTx)k+1superscriptsuperscript𝑥𝑇𝑥𝑘1(x^{T}x)^{k+1}( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT and move the coefficients of αk(xTx)ksubscript𝛼𝑘superscriptsuperscript𝑥𝑇𝑥𝑘\alpha_{k}(x^{T}x)^{k}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT to the matrix A𝐴Aitalic_A. Adding the two identities, we observe that:

1+α(xTx)k+1=ϕk+1(x)T[Q^kAATαP]ϕk+1(x).1𝛼superscriptsuperscript𝑥𝑇𝑥𝑘1subscriptitalic-ϕ𝑘1superscript𝑥𝑇delimited-[]matrixsubscript^𝑄𝑘𝐴superscript𝐴𝑇𝛼𝑃subscriptitalic-ϕ𝑘1𝑥1+\alpha(x^{T}x)^{k+1}=\phi_{k+1}(x)^{T}\left[\begin{matrix}\hat{Q}_{k}&A\\ A^{T}&\alpha P\end{matrix}\right]\phi_{k+1}(x).1 + italic_α ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_CELL start_CELL italic_A end_CELL end_ROW start_ROW start_CELL italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL italic_α italic_P end_CELL end_ROW end_ARG ] italic_ϕ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ( italic_x ) .

Since Q^ksubscript^𝑄𝑘\hat{Q}_{k}over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and P𝑃Pitalic_P are both positive definite matrices, by the Schur complement condition, whenever αPATQ^k1A0succeeds𝛼𝑃superscript𝐴𝑇superscriptsubscript^𝑄𝑘1𝐴0\alpha P-A^{T}\hat{Q}_{k}^{-1}A\succ 0italic_α italic_P - italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A ≻ 0, the matrix on the right-hand side of the above expression will be positive definite. One can therefore choose αk+1subscript𝛼𝑘1\alpha_{k+1}italic_α start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT to be any large enough value of α𝛼\alphaitalic_α that satisfies the previous condition and let Q^k+1:=[Q^kAATαk+1P]\hat{Q}_{k+1}\mathrel{\mathop{:}}=\left[\begin{matrix}\hat{Q}_{k}&A\\ A^{T}&\alpha_{k+1}P\end{matrix}\right]over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT : = [ start_ARG start_ROW start_CELL over^ start_ARG italic_Q end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_CELL start_CELL italic_A end_CELL end_ROW start_ROW start_CELL italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL start_CELL italic_α start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT italic_P end_CELL end_ROW end_ARG ]. We have thus proved Claim 0.

By Claim 0 (with d𝑑ditalic_d replaced by d1𝑑1d-1italic_d - 1), we can fix a positive definite matrix Q𝑄Qitalic_Q such that 1+d(xTx)d1=ϕ(x)d1TQϕd1(x)1𝑑superscriptsuperscript𝑥𝑇𝑥𝑑1italic-ϕsuperscriptsubscript𝑥𝑑1𝑇𝑄subscriptitalic-ϕ𝑑1𝑥1+d(x^{T}x)^{d-1}=\phi(x)_{d-1}^{T}Q\phi_{d-1}(x)1 + italic_d ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT = italic_ϕ ( italic_x ) start_POSTSUBSCRIPT italic_d - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q italic_ϕ start_POSTSUBSCRIPT italic_d - 1 end_POSTSUBSCRIPT ( italic_x ) for all x𝑥xitalic_x. One can check that

yT2p(x)ysuperscript𝑦𝑇superscript2𝑝𝑥𝑦\displaystyle y^{T}\nabla^{2}p(x)yitalic_y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p ( italic_x ) italic_y =yT(2I+2d(xTx)d1I+4d(d1)(xTx)d2xxT)yabsentsuperscript𝑦𝑇2𝐼2𝑑superscriptsuperscript𝑥𝑇𝑥𝑑1𝐼4𝑑𝑑1superscriptsuperscript𝑥𝑇𝑥𝑑2𝑥superscript𝑥𝑇𝑦\displaystyle=y^{T}\bigg{(}2I+2d(x^{T}x)^{d-1}I+4d(d-1)(x^{T}x)^{d-2}xx^{T}% \bigg{)}y= italic_y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( 2 italic_I + 2 italic_d ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT italic_I + 4 italic_d ( italic_d - 1 ) ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_d - 2 end_POSTSUPERSCRIPT italic_x italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) italic_y
=2(yTy)(1+d(xTx)d1)+4d(d1)(xTx)d2(xTy)2absent2superscript𝑦𝑇𝑦1𝑑superscriptsuperscript𝑥𝑇𝑥𝑑14𝑑𝑑1superscriptsuperscript𝑥𝑇𝑥𝑑2superscriptsuperscript𝑥𝑇𝑦2\displaystyle=2(y^{T}y)(1+d(x^{T}x)^{d-1})+4d(d-1)(x^{T}x)^{d-2}(x^{T}y)^{2}= 2 ( italic_y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_y ) ( 1 + italic_d ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT ) + 4 italic_d ( italic_d - 1 ) ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_d - 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=2(yTy)ϕ(x)d1TQϕd1(x)+4d(d1)(xTx)d2(xTy)2absent2superscript𝑦𝑇𝑦italic-ϕsuperscriptsubscript𝑥𝑑1𝑇𝑄subscriptitalic-ϕ𝑑1𝑥4𝑑𝑑1superscriptsuperscript𝑥𝑇𝑥𝑑2superscriptsuperscript𝑥𝑇𝑦2\displaystyle=2(y^{T}y)\phi(x)_{d-1}^{T}Q\phi_{d-1}(x)+4d(d-1)(x^{T}x)^{d-2}(x% ^{T}y)^{2}= 2 ( italic_y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_y ) italic_ϕ ( italic_x ) start_POSTSUBSCRIPT italic_d - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q italic_ϕ start_POSTSUBSCRIPT italic_d - 1 end_POSTSUBSCRIPT ( italic_x ) + 4 italic_d ( italic_d - 1 ) ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_d - 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=(ϕd1(x)y)T(Q2I+Q)(ϕd1(x)y),absentsuperscripttensor-productsubscriptitalic-ϕ𝑑1𝑥𝑦𝑇tensor-product𝑄2𝐼superscript𝑄tensor-productsubscriptitalic-ϕ𝑑1𝑥𝑦\displaystyle=(\phi_{d-1}(x)\otimes y)^{T}(Q\otimes 2I+Q^{\prime})(\phi_{d-1}(% x)\otimes y),= ( italic_ϕ start_POSTSUBSCRIPT italic_d - 1 end_POSTSUBSCRIPT ( italic_x ) ⊗ italic_y ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_Q ⊗ 2 italic_I + italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ( italic_ϕ start_POSTSUBSCRIPT italic_d - 1 end_POSTSUBSCRIPT ( italic_x ) ⊗ italic_y ) ,

where Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT can be taken to be positive semidefinite since 4d(d1)(xTx)d2(xTy)24𝑑𝑑1superscriptsuperscript𝑥𝑇𝑥𝑑2superscriptsuperscript𝑥𝑇𝑦24d(d-1)(x^{T}x)^{d-2}(x^{T}y)^{2}4 italic_d ( italic_d - 1 ) ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) start_POSTSUPERSCRIPT italic_d - 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is sos. Since the matrix Q2I+Qtensor-product𝑄2𝐼superscript𝑄Q\otimes 2I+Q^{\prime}italic_Q ⊗ 2 italic_I + italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is positive definite, it follows that p𝑝pitalic_p is in the interior of the cone of sos-convex polynomials of degree at most 2d2𝑑2d2 italic_d. ∎

Lemma 3.

Suppose f:n:𝑓maps-tosuperscript𝑛f:\mathbb{R}^{n}\mapsto\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ↦ blackboard_R has continuous derivatives up to order d𝑑ditalic_d over a compact set Bn𝐵superscript𝑛B\subseteq\mathbb{R}^{n}italic_B ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. If 2f(x)0succeedssuperscript2𝑓𝑥0\nabla^{2}f(x)\succ 0∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x ) ≻ 0 for all xB𝑥𝐵x\in Bitalic_x ∈ italic_B, then t(x)𝑡𝑥t(x)italic_t ( italic_x ) (i.e., the optimal value of (3)) is uniformly bounded from above over B𝐵Bitalic_B.

Proof.

Let δ𝛿\deltaitalic_δ be a positive scalar such that λmin2f(x)δsubscript𝜆superscript2𝑓𝑥𝛿\lambda_{\min}\nabla^{2}f(x)\geq\deltaitalic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x ) ≥ italic_δ for all xB𝑥𝐵x\in Bitalic_x ∈ italic_B. Let xsuperscript𝑥x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be any vector in B𝐵Bitalic_B, and define

Fx(x):=2δTx,d(x+x).F_{x^{\prime}}(x)\mathrel{\mathop{:}}=\frac{2}{\delta}T_{x^{\prime},d}(x^{% \prime}+x).italic_F start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) : = divide start_ARG 2 end_ARG start_ARG italic_δ end_ARG italic_T start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_x ) .

Since 2f(x)δIsucceeds-or-equalssuperscript2𝑓superscript𝑥𝛿𝐼\nabla^{2}f(x^{\prime})\succeq\delta I∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⪰ italic_δ italic_I, we have 2Fx(0)2Isucceeds-or-equalssuperscript2subscript𝐹superscript𝑥02𝐼\nabla^{2}F_{x^{\prime}}(0)\succeq 2I∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( 0 ) ⪰ 2 italic_I. Let Qxsubscript𝑄superscript𝑥Q_{x^{\prime}}italic_Q start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT (resp. Cxsubscript𝐶superscript𝑥C_{x^{\prime}}italic_C start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT) be the sum of the quadratic and higher (resp. cubic and higher) terms of Fxsubscript𝐹superscript𝑥F_{x^{\prime}}italic_F start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. For a polynomial p𝑝pitalic_p, define psubscriptnorm𝑝||p||_{\infty}| | italic_p | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT as the infinity norm of the coefficients of p𝑝pitalic_p when expressed in the standard monomial basis. By Lemma 2, we can fix a positive scalar R𝑅Ritalic_R such that for any polynomial p𝑝pitalic_p of degree at most d𝑑ditalic_d with pRsubscriptnorm𝑝𝑅||p||_{\infty}\leq R| | italic_p | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_R, we have that the polynomial x2+xd+psuperscriptnorm𝑥2superscriptnorm𝑥superscript𝑑𝑝||x||^{2}+||x||^{d^{\prime}}+p| | italic_x | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | | italic_x | | start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT + italic_p is sos-convex. Fix a scalar M𝑀Mitalic_M such that Cx<Msubscriptnormsubscript𝐶superscript𝑥𝑀||C_{x^{\prime}}||_{\infty}<M| | italic_C start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT < italic_M for all xBsuperscript𝑥𝐵x^{\prime}\in Bitalic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_B. Define α:=min{1,RM}\alpha\mathrel{\mathop{:}}=\min\{1,\frac{R}{M}\}italic_α : = roman_min { 1 , divide start_ARG italic_R end_ARG start_ARG italic_M end_ARG }. We have ||xCx(αx)||α3||Cx||||x\mapsto C_{x^{\prime}}(\alpha x)||_{\infty}\leq\alpha^{3}||C_{x^{\prime}}||% _{\infty}| | italic_x ↦ italic_C start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_α italic_x ) | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_α start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT | | italic_C start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT since all terms of Cxsubscript𝐶superscript𝑥C_{x^{\prime}}italic_C start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT are of cubic or higher order. Then we can write

1α2Qx(αx)+xd1superscript𝛼2subscript𝑄superscript𝑥𝛼𝑥superscriptnorm𝑥superscript𝑑\displaystyle\frac{1}{\alpha^{2}}Q_{x^{\prime}}(\alpha x)+\|x\|^{d^{\prime}}divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_Q start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_α italic_x ) + ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT =12xT2Fx(0)x+1α2Cx(αx)+xdabsent12superscript𝑥𝑇superscript2subscript𝐹superscript𝑥0𝑥1superscript𝛼2subscript𝐶superscript𝑥𝛼𝑥superscriptnorm𝑥superscript𝑑\displaystyle=\frac{1}{2}x^{T}\nabla^{2}F_{x^{\prime}}(0)x+\frac{1}{\alpha^{2}% }C_{x^{\prime}}(\alpha x)+\|x\|^{d^{\prime}}= divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( 0 ) italic_x + divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_α italic_x ) + ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT
=12xT(2Fx(0)2I)x+1α2Cx(αx)+(x2+xd).absent12superscript𝑥𝑇superscript2subscript𝐹superscript𝑥02𝐼𝑥1superscript𝛼2subscript𝐶superscript𝑥𝛼𝑥superscriptnorm𝑥2superscriptnorm𝑥superscript𝑑\displaystyle=\frac{1}{2}x^{T}(\nabla^{2}F_{x^{\prime}}(0)-2I)x+\frac{1}{% \alpha^{2}}C_{x^{\prime}}(\alpha x)+(\|x\|^{2}+\|x\|^{d^{\prime}}).= divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( 0 ) - 2 italic_I ) italic_x + divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_α italic_x ) + ( ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) .

Since 2Fx(0)2Isucceeds-or-equalssuperscript2subscript𝐹superscript𝑥02𝐼\nabla^{2}F_{x^{\prime}}(0)\succeq 2I∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( 0 ) ⪰ 2 italic_I, the first term is sos-convex. We can bound the second term as follows: x1α2Cx(αx)α||Cx||αMR\|x\mapsto\frac{1}{\alpha^{2}}C_{x^{\prime}}(\alpha x)\|_{\infty}\leq\alpha||C% _{x^{\prime}}||_{\infty}\leq\alpha M\leq R∥ italic_x ↦ divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_C start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_α italic_x ) ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_α | | italic_C start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_α italic_M ≤ italic_R. Thus, the sum of the second and the third term is sos-convex by the definition of R𝑅Ritalic_R. It follows that the polynomial

1α2Qx(αx)+xd is sos-convex.1superscript𝛼2subscript𝑄superscript𝑥𝛼𝑥superscriptnorm𝑥superscript𝑑 is sos-convex.\frac{1}{\alpha^{2}}Q_{x^{\prime}}(\alpha x)+\|x\|^{d^{\prime}}\text{ is sos-% convex.}divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_Q start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_α italic_x ) + ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is sos-convex.

We can then conclude the sos-convexity of the polynomials

  1. (a)

    Qx(αx)+α2xdsubscript𝑄superscript𝑥𝛼𝑥superscript𝛼2superscriptnorm𝑥superscript𝑑Q_{x^{\prime}}(\alpha x)+\alpha^{2}\|x\|^{d^{\prime}}italic_Q start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_α italic_x ) + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT,

  2. (b)

    Fx(αx)+α2xdsubscript𝐹superscript𝑥𝛼𝑥superscript𝛼2superscriptnorm𝑥superscript𝑑F_{x^{\prime}}(\alpha x)+\alpha^{2}\|x\|^{d^{\prime}}italic_F start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_α italic_x ) + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT,

  3. (c)

    Fx(αx)+α2dαxdsubscript𝐹superscript𝑥𝛼𝑥superscript𝛼2superscript𝑑superscriptnorm𝛼𝑥superscript𝑑F_{x^{\prime}}(\alpha x)+\alpha^{2-d^{\prime}}\|\alpha x\|^{d^{\prime}}italic_F start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_α italic_x ) + italic_α start_POSTSUPERSCRIPT 2 - italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∥ italic_α italic_x ∥ start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT,

  4. (d)

    Fx(x)+α2dxdsubscript𝐹superscript𝑥𝑥superscript𝛼2superscript𝑑superscriptnorm𝑥superscript𝑑F_{x^{\prime}}(x)+\alpha^{2-d^{\prime}}\|x\|^{d^{\prime}}italic_F start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) + italic_α start_POSTSUPERSCRIPT 2 - italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT,

  5. (e)

    Tx,d(x+x)+δ2α2dxdsubscript𝑇superscript𝑥𝑑superscript𝑥𝑥𝛿2superscript𝛼2superscript𝑑superscriptnorm𝑥superscript𝑑T_{x^{\prime},d}(x^{\prime}+x)+\frac{\delta}{2}\alpha^{2-d^{\prime}}\|x\|^{d^{% \prime}}italic_T start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_x ) + divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG italic_α start_POSTSUPERSCRIPT 2 - italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, and

  6. (f)

    Tx,d(x)+δ2α2dxxdsubscript𝑇superscript𝑥𝑑𝑥𝛿2superscript𝛼2superscript𝑑superscriptnorm𝑥superscript𝑥superscript𝑑T_{x^{\prime},d}(x)+\frac{\delta}{2}\alpha^{2-d^{\prime}}\|x-x^{\prime}\|^{d^{% \prime}}italic_T start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x ) + divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG italic_α start_POSTSUPERSCRIPT 2 - italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∥ italic_x - italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT,

respectively (a) by scaling, (b) by the observation that the affine terms do not affect sos-convexity, (c) by rewriting, (d) by a linear change of coordinates, (e) by another rescaling, and (f) by an affine change of coordinates. Thus, we have t(x)δ2α2d𝑡𝑥𝛿2superscript𝛼2superscript𝑑t(x)\leq\frac{\delta}{2}\alpha^{2-d^{\prime}}italic_t ( italic_x ) ≤ divide start_ARG italic_δ end_ARG start_ARG 2 end_ARG italic_α start_POSTSUPERSCRIPT 2 - italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT for xB𝑥𝐵x\in Bitalic_x ∈ italic_B.

We next use a quadrature rule for integration to establish a technical lemma that is needed for the remainder of this section. By a polynomial matrix, we mean a matrix whose entries are polynomial functions.

Lemma 4.

Let M:𝕊n×n:𝑀maps-tosuperscript𝕊𝑛𝑛M:\mathbb{R}\mapsto\mathbb{S}^{n\times n}italic_M : blackboard_R ↦ blackboard_S start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT be univariate polynomial matrix whose entries have degree at most d𝑑ditalic_d, where d𝑑ditalic_d is even. Suppose M(s)0succeeds-or-equals𝑀𝑠0M(s)\succeq 0italic_M ( italic_s ) ⪰ 0 for all s[0,1]𝑠01s\in[0,1]italic_s ∈ [ 0 , 1 ]. Then,

01M(s)𝑑s12(d21)M(α)succeeds-or-equalssuperscriptsubscript01𝑀𝑠differential-d𝑠12superscript𝑑21𝑀𝛼\int_{0}^{1}M(s)ds\succeq\frac{1}{2(d^{2}-1)}M(\alpha)∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_M ( italic_s ) italic_d italic_s ⪰ divide start_ARG 1 end_ARG start_ARG 2 ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 ) end_ARG italic_M ( italic_α )

for α{0,1}𝛼01\alpha\in\{0,1\}italic_α ∈ { 0 , 1 }.

Proof.

Using a quadrature rule for integration proposed in [21] and analyzed in [27], there exist a set of weights w0,,wd0subscript𝑤0subscript𝑤𝑑0w_{0},\dots,w_{d}\geq 0italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ≥ 0, with w0=1d21subscript𝑤01superscript𝑑21w_{0}=\frac{1}{d^{2}-1}italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 end_ARG, and a set of points s0,,sd[1,1]subscript𝑠0subscript𝑠𝑑11s_{0},\dots,s_{d}\in[-1,1]italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∈ [ - 1 , 1 ], with s0=1subscript𝑠01s_{0}=1italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1, such that for any polynomial p𝑝pitalic_p of degree at most d𝑑ditalic_d we have

11p(s)𝑑s=i=0dwip(si).superscriptsubscript11𝑝𝑠differential-d𝑠superscriptsubscript𝑖0𝑑subscript𝑤𝑖𝑝subscript𝑠𝑖\int_{-1}^{1}p(s)ds=\sum_{i=0}^{d}w_{i}p(s_{i}).∫ start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_p ( italic_s ) italic_d italic_s = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p ( italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) .

Now we can write

01M(s)𝑑ssuperscriptsubscript01𝑀𝑠differential-d𝑠\displaystyle\int_{0}^{1}M(s)ds∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_M ( italic_s ) italic_d italic_s =1211M(1s2)𝑑sabsent12superscriptsubscript11𝑀1𝑠2differential-d𝑠\displaystyle=\frac{1}{2}\int_{-1}^{1}M\left(\frac{1-s}{2}\right)ds= divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_M ( divide start_ARG 1 - italic_s end_ARG start_ARG 2 end_ARG ) italic_d italic_s
=12i=0dwiM(1si2)absent12superscriptsubscript𝑖0𝑑subscript𝑤𝑖𝑀1subscript𝑠𝑖2\displaystyle=\frac{1}{2}\sum_{i=0}^{d}w_{i}M\left(\frac{1-s_{i}}{2}\right)= divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_M ( divide start_ARG 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG )
12w0M(1s02)=12(d21)M(0).succeeds-or-equalsabsent12subscript𝑤0𝑀1subscript𝑠0212superscript𝑑21𝑀0\displaystyle\succeq\frac{1}{2}w_{0}M\left(\frac{1-s_{0}}{2}\right)=\frac{1}{2% (d^{2}-1)}M(0).⪰ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_M ( divide start_ARG 1 - italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) = divide start_ARG 1 end_ARG start_ARG 2 ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 ) end_ARG italic_M ( 0 ) .

By replacing s𝑠sitalic_s with 1s1𝑠1-s1 - italic_s, the claim with α=1𝛼1\alpha=1italic_α = 1 follows. ∎

The next lemma directly proves the second claim of Theorem 3 and is possibly of independent interest.

Lemma 5.

If a convex polynomial p:n:𝑝maps-tosuperscript𝑛p:\mathbb{R}^{n}\mapsto\mathbb{R}italic_p : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ↦ blackboard_R satisfies 2p(x0)0succeedssuperscript2𝑝subscript𝑥00\nabla^{2}p(x_{0})\succ 0∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ≻ 0 for any point x0nsubscript𝑥0superscript𝑛x_{0}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, then p𝑝pitalic_p has a unique minimizer.

Proof.

Without loss of generality, assume x0=0subscript𝑥00x_{0}=0italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0. Let d𝑑ditalic_d be an even integer greater than the degree of the Hessian of p𝑝pitalic_p. For any xn𝑥superscript𝑛x\in\mathbb{R}^{n}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT,

p(x)𝑝𝑥\displaystyle p(x)italic_p ( italic_x ) =p(0)+xTp(0)+xT(010t2p(sx)𝑑s𝑑t)xabsent𝑝0superscript𝑥𝑇𝑝0superscript𝑥𝑇superscriptsubscript01superscriptsubscript0𝑡superscript2𝑝𝑠𝑥differential-d𝑠differential-d𝑡𝑥\displaystyle=p(0)+x^{T}\nabla p(0)+x^{T}\left(\int_{0}^{1}\int_{0}^{t}\nabla^% {2}p(sx)dsdt\right)x= italic_p ( 0 ) + italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∇ italic_p ( 0 ) + italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p ( italic_s italic_x ) italic_d italic_s italic_d italic_t ) italic_x
=p(0)+xTp(0)+xT(01t012p(stx)𝑑s𝑑t)xabsent𝑝0superscript𝑥𝑇𝑝0superscript𝑥𝑇superscriptsubscript01𝑡superscriptsubscript01superscript2𝑝𝑠𝑡𝑥differential-d𝑠differential-d𝑡𝑥\displaystyle=p(0)+x^{T}\nabla p(0)+x^{T}\left(\int_{0}^{1}t\int_{0}^{1}\nabla% ^{2}p(stx)dsdt\right)x= italic_p ( 0 ) + italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∇ italic_p ( 0 ) + italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_t ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p ( italic_s italic_t italic_x ) italic_d italic_s italic_d italic_t ) italic_x
p(0)+xTp(0)+xT(01t(12(d21)2p(0))𝑑t)xabsent𝑝0superscript𝑥𝑇𝑝0superscript𝑥𝑇superscriptsubscript01𝑡12superscript𝑑21superscript2𝑝0differential-d𝑡𝑥\displaystyle\geq p(0)+x^{T}\nabla p(0)+x^{T}\left(\int_{0}^{1}t\left(\frac{1}% {2(d^{2}-1)}\nabla^{2}p(0)\right)dt\right)x≥ italic_p ( 0 ) + italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∇ italic_p ( 0 ) + italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_t ( divide start_ARG 1 end_ARG start_ARG 2 ( italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 ) end_ARG ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p ( 0 ) ) italic_d italic_t ) italic_x
=p(0)+xTp(0)+14xT(1d212p(0))x,absent𝑝0superscript𝑥𝑇𝑝014superscript𝑥𝑇1superscript𝑑21superscript2𝑝0𝑥\displaystyle=p(0)+x^{T}\nabla p(0)+\frac{1}{4}x^{T}\left(\frac{1}{d^{2}-1}% \nabla^{2}p(0)\right)x,= italic_p ( 0 ) + italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∇ italic_p ( 0 ) + divide start_ARG 1 end_ARG start_ARG 4 end_ARG italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 end_ARG ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p ( 0 ) ) italic_x ,

where the inequality follows from Lemma 4. Thus, p𝑝pitalic_p is lower bounded by a coercive666We recall that a function g:n:𝑔maps-tosuperscript𝑛g:\mathbb{R}^{n}\mapsto\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ↦ blackboard_R is coercive if g(x)𝑔𝑥g(x)\rightarrow\inftyitalic_g ( italic_x ) → ∞ as xnorm𝑥||x||\rightarrow\infty| | italic_x | | → ∞. quadratic function, and hence p𝑝pitalic_p is coercive itself. A coercive function that is convex (and hence continuous) has at least one minimizer.

Suppose for the sake of contradiction that p𝑝pitalic_p had two minimizers x¯,y¯¯𝑥¯𝑦\bar{x},\bar{y}over¯ start_ARG italic_x end_ARG , over¯ start_ARG italic_y end_ARG. Then, by convexity, any point on the line segment connecting x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG and y¯¯𝑦\bar{y}over¯ start_ARG italic_y end_ARG would also be a minimizer. Since p𝑝pitalic_p is a polynomial, it follows that p𝑝pitalic_p must be constant along the line passing through x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG and y¯¯𝑦\bar{y}over¯ start_ARG italic_y end_ARG. This contradicts coercivity. ∎

We remark that the statement of Lemma 5 does not hold for non-polynomial convex functions (consider, e.g., the univariate function max{0,x21}0superscript𝑥21\max\{0,x^{2}-1\}roman_max { 0 , italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 }).

The next lemma is used in the proof of Theorem 4.

Lemma 6.

There exists a constant r>0𝑟0r>0italic_r > 0 such that if xkxr,normsubscript𝑥𝑘superscript𝑥𝑟||x_{k}-x^{*}||\leq r,| | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | ≤ italic_r , then λmin2ψxk,d(x)12λmin2f(x)subscript𝜆superscript2subscript𝜓subscript𝑥𝑘𝑑superscript𝑥12subscript𝜆superscript2𝑓superscript𝑥\lambda_{\min}\nabla^{2}\psi_{x_{k},d}(x^{*})\geq\frac{1}{2}\lambda_{\min}% \nabla^{2}f(x^{*})italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ).

Proof.

We show that we can take

r=min{rL,((d1)!λmin2f(x)2L)1d1}𝑟subscript𝑟𝐿superscript𝑑1subscript𝜆superscript2𝑓superscript𝑥2𝐿1𝑑1r=\min\left\{r_{L},\left(\frac{(d-1)!\lambda_{\min}\nabla^{2}f(x^{*})}{2L}% \right)^{\frac{1}{d-1}}\right\}italic_r = roman_min { italic_r start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , ( divide start_ARG ( italic_d - 1 ) ! italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 italic_L end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_d - 1 end_ARG end_POSTSUPERSCRIPT }

where rLsubscript𝑟𝐿r_{L}italic_r start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT and L𝐿Litalic_L are as in the first paragraph of Section 3. By Lemma 1, For every x𝑥xitalic_x satisfying xxkrLnorm𝑥subscript𝑥𝑘subscript𝑟𝐿\|x-x_{k}\|\ \leq r_{L}∥ italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ ≤ italic_r start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT, we have

2f(x)2Txk,d(x)L(d1)!xxkd1.normsuperscript2𝑓𝑥superscript2subscript𝑇subscript𝑥𝑘𝑑𝑥𝐿𝑑1superscriptnorm𝑥subscript𝑥𝑘𝑑1\|\nabla^{2}f(x)-\nabla^{2}T_{x_{k},d}(x)\|\leq\frac{L}{(d-1)!}\|x-x_{k}\|^{d-% 1}.∥ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x ) - ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x ) ∥ ≤ divide start_ARG italic_L end_ARG start_ARG ( italic_d - 1 ) ! end_ARG ∥ italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT .

Thus, if xxkrnormsuperscript𝑥subscript𝑥𝑘𝑟\|x^{*}-x_{k}\|\leq r∥ italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ ≤ italic_r, we have

2f(x)2Txk,d(x)12λmin2f(x).normsuperscript2𝑓superscript𝑥superscript2subscript𝑇subscript𝑥𝑘𝑑superscript𝑥12subscript𝜆superscript2𝑓superscript𝑥\|\nabla^{2}f(x^{*})-\nabla^{2}T_{x_{k},d}(x^{*})\|\leq\frac{1}{2}\lambda_{% \min}\nabla^{2}f(x^{*}).∥ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ ≤ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) .

It follows that

λmin2Txk,d(x)12λmin2f(x).subscript𝜆superscript2subscript𝑇subscript𝑥𝑘𝑑superscript𝑥12subscript𝜆superscript2𝑓superscript𝑥\lambda_{\min}\nabla^{2}T_{x_{k},d}(x^{*})\geq\frac{1}{2}\lambda_{\min}\nabla^% {2}f(x^{*}).italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) .

Indeed, if there was a unit vector y𝑦yitalic_y such that if yT2Txk,d(x)y<12λmin2f(x)superscript𝑦𝑇superscript2subscript𝑇subscript𝑥𝑘𝑑superscript𝑥𝑦12subscript𝜆superscript2𝑓superscript𝑥y^{T}\nabla^{2}T_{x_{k},d}(x^{*})y<\frac{1}{2}\lambda_{\min}\nabla^{2}f(x^{*})italic_y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_y < divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), the previous inequality would be violated.

Recall from (4) that ψxk,dsubscript𝜓subscript𝑥𝑘𝑑\psi_{x_{k},d}italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT is obtained by adding to Txk,dsubscript𝑇subscript𝑥𝑘𝑑T_{x_{k},d}italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT the convex function t(xk)xxkd𝑡subscript𝑥𝑘superscriptnorm𝑥subscript𝑥𝑘superscript𝑑t(x_{k})\|x-x_{k}\|^{d^{\prime}}italic_t ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. Therefore, we have 2ψxk,d(x)2Txk,d(x),succeeds-or-equalssuperscript2subscript𝜓subscript𝑥𝑘𝑑superscript𝑥superscript2subscript𝑇subscript𝑥𝑘𝑑superscript𝑥\nabla^{2}\psi_{x_{k},d}(x^{*})\succeq\nabla^{2}T_{x_{k},d}(x^{*}),∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ⪰ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , which gives the claim.

We now have all the ingredients to prove Theorems 3 and 4.

Proof of Theorem 3.

(i) When 2f(xk)0succeedssuperscript2𝑓subscript𝑥𝑘0\nabla^{2}f(x_{k})\succ 0∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≻ 0, the proof of Lemma 3 with B={xk}𝐵subscript𝑥𝑘B=\{x_{k}\}italic_B = { italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } demonstrates a feasible solution to (3). This argument also extends to show feasibility of (5) since the polynomial Txk,d(x)+12(ελmin2f(xk))xxk2subscript𝑇subscript𝑥𝑘𝑑𝑥12𝜀subscript𝜆superscript2𝑓subscript𝑥𝑘superscriptnorm𝑥subscript𝑥𝑘2T_{x_{k},d}(x)+\frac{1}{2}\big{(}\varepsilon-\lambda_{\min}\nabla^{2}f(x_{k})% \big{)}||x-x_{k}||^{2}italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_ε - italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) | | italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT has a positive definite Hessian at xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.  
(ii) At Algorithm 1 (resp. Algorithm 1), ψxk,dsubscript𝜓subscript𝑥𝑘𝑑\psi_{x_{k},d}italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT (resp. ψ¯xk,dsubscript¯𝜓subscript𝑥𝑘𝑑\bar{\psi}_{x_{k},d}over¯ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT) has a positive definite Hessian at xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Moreover, the polynomial ψxk,dsubscript𝜓subscript𝑥𝑘𝑑\psi_{x_{k},d}italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT (resp. ψ¯xk,dsubscript¯𝜓subscript𝑥𝑘𝑑\bar{\psi}_{x_{k},d}over¯ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT) is sos-convex and therefore convex. Thus, by Lemma 5, ψxk,dsubscript𝜓subscript𝑥𝑘𝑑\psi_{x_{k},d}italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT (resp. ψ¯xk,dsubscript¯𝜓subscript𝑥𝑘𝑑\bar{\psi}_{x_{k},d}over¯ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT) has a unique minimizer.

Proof of Theorem 4.

Since d>1𝑑1d>1italic_d > 1, it suffices to show that there exist constants r,c>0superscript𝑟superscript𝑐0r^{\prime},c^{\prime}>0italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > 0 such that if x0xrnormsubscript𝑥0superscript𝑥superscript𝑟||x_{0}-x^{*}||\leq r^{\prime}| | italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | ≤ italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, then x1xcx0xdnormsubscript𝑥1superscript𝑥superscript𝑐superscriptnormsubscript𝑥0superscript𝑥𝑑||x_{1}-x^{*}||\leq c^{\prime}||x_{0}-x^{*}||^{d}| | italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | ≤ italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | | italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

By continuity of the map xλmin2f(x)maps-to𝑥subscript𝜆superscript2𝑓𝑥x\mapsto\lambda_{\min}\nabla^{2}f(x)italic_x ↦ italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x ), there exists a scalar r1>0subscript𝑟10r_{1}>0italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 such that λmin2f(x)12λmin2f(x)>0subscript𝜆superscript2𝑓𝑥12subscript𝜆superscript2𝑓superscript𝑥0\lambda_{\min}\nabla^{2}f(x)\geq\frac{1}{2}\lambda_{\min}\nabla^{2}f(x^{*})>0italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x ) ≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) > 0 for all x𝑥xitalic_x with xxr1norm𝑥superscript𝑥subscript𝑟1||x-x^{*}||\leq r_{1}| | italic_x - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | ≤ italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

Let r2>0subscript𝑟20r_{2}>0italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 be the constant needed for the conclusion of Lemma 6 to hold. Define

r:=min{rL,r1,r2}r^{\prime}\mathrel{\mathop{:}}=\min\{r_{L},r_{1},r_{2}\}italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : = roman_min { italic_r start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }

and Ω:={xn||xx||r}\Omega\mathrel{\mathop{:}}=\{x\in\mathbb{R}^{n}\mid||x-x^{*}||\leq r^{\prime}\}roman_Ω : = { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∣ | | italic_x - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | ≤ italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT }. Suppose x0Ωsubscript𝑥0Ωx_{0}\in\Omegaitalic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ roman_Ω. Note that in this case, Algorithm 1 finds the next iterate x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT by minimizing the polynomial ψx0,dsubscript𝜓subscript𝑥0𝑑\psi_{x_{0},d}italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT defined in (4)italic-(4italic-)\eqref{eq:psi}italic_( italic_).

By the fundamental theorem of calculus, we have

ψx0,d(x)ψx0,d(x1)=(012ψx0,d(x1+s(xx1))𝑑s)(xx1).subscript𝜓subscript𝑥0𝑑superscript𝑥subscript𝜓subscript𝑥0𝑑subscript𝑥1superscriptsubscript01superscript2subscript𝜓subscript𝑥0𝑑subscript𝑥1𝑠superscript𝑥subscript𝑥1differential-d𝑠superscript𝑥subscript𝑥1\nabla\psi_{x_{0},d}(x^{*})-\nabla\psi_{x_{0},d}(x_{1})=\left(\int_{0}^{1}% \nabla^{2}\psi_{x_{0},d}(x_{1}+s(x^{*}-x_{1}))ds\right)(x^{*}-x_{1}).∇ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - ∇ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = ( ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_s ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) italic_d italic_s ) ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) .

Since x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT minimizes ψx0,dsubscript𝜓subscript𝑥0𝑑\psi_{x_{0},d}italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT, we have ψx0,d(x1)=0subscript𝜓subscript𝑥0𝑑subscript𝑥10\nabla\psi_{x_{0},d}(x_{1})=0∇ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = 0, and thus

ψx0,d(x)=(012ψx0,d(x1+s(xx1))𝑑s)(xx1).subscript𝜓subscript𝑥0𝑑superscript𝑥superscriptsubscript01superscript2subscript𝜓subscript𝑥0𝑑subscript𝑥1𝑠superscript𝑥subscript𝑥1differential-d𝑠superscript𝑥subscript𝑥1\nabla\psi_{x_{0},d}(x^{*})=\left(\int_{0}^{1}\nabla^{2}\psi_{x_{0},d}(x_{1}+s% (x^{*}-x_{1}))ds\right)(x^{*}-x_{1}).∇ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = ( ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_s ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) italic_d italic_s ) ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) .

We can bound the norm of this vector from below:

ψx0,d(x)λmin(012ψx0,d(x1+s(xx1))𝑑s)xx1.normsubscript𝜓subscript𝑥0𝑑superscript𝑥subscript𝜆superscriptsubscript01superscript2subscript𝜓subscript𝑥0𝑑subscript𝑥1𝑠superscript𝑥subscript𝑥1differential-d𝑠normsuperscript𝑥subscript𝑥1||\nabla\psi_{x_{0},d}(x^{*})||\geq\lambda_{\min}\left(\int_{0}^{1}\nabla^{2}% \psi_{x_{0},d}(x_{1}+s(x^{*}-x_{1}))ds\right)||x^{*}-x_{1}||.| | ∇ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | | ≥ italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_s ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) italic_d italic_s ) | | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | | . (7)

Applying first Lemma 4 and then Lemma 6, we have

λmin(012ψx0,d(x1+s(xx1))𝑑s)subscript𝜆superscriptsubscript01superscript2subscript𝜓subscript𝑥0𝑑subscript𝑥1𝑠superscript𝑥subscript𝑥1differential-d𝑠\displaystyle\lambda_{\min}\left(\int_{0}^{1}\nabla^{2}\psi_{x_{0},d}(x_{1}+s(% x^{*}-x_{1}))ds\right)italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_s ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) italic_d italic_s ) λmin2ψx0,d(x)2((d2)21)absentsubscript𝜆superscript2subscript𝜓subscript𝑥0𝑑superscript𝑥2superscriptsuperscript𝑑221\displaystyle\geq\frac{\lambda_{\min}\nabla^{2}\psi_{x_{0},d}(x^{*})}{2((d^{% \prime}-2)^{2}-1)}≥ divide start_ARG italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 ( ( italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 2 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 ) end_ARG
λmin2f(x)4((d2)21).absentsubscript𝜆superscript2𝑓superscript𝑥4superscriptsuperscript𝑑221\displaystyle\geq\frac{\lambda_{\min}\nabla^{2}f(x^{*})}{4((d^{\prime}-2)^{2}-% 1)}.≥ divide start_ARG italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG start_ARG 4 ( ( italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 2 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 ) end_ARG .

Substituting this into (7) and rearranging yields

x1x4((d2)21)λmin2f(x)ψx0,d(x).normsubscript𝑥1superscript𝑥4superscriptsuperscript𝑑221subscript𝜆superscript2𝑓superscript𝑥normsubscript𝜓subscript𝑥0𝑑superscript𝑥\|x_{1}-x^{*}\|\leq\frac{4((d^{\prime}-2)^{2}-1)}{\lambda_{\min}\nabla^{2}f(x^% {*})}\|\nabla\psi_{x_{0},d}(x^{*})\|.∥ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ divide start_ARG 4 ( ( italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 2 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG ∥ ∇ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ . (8)

Expanding ψx0,d(x)subscript𝜓subscript𝑥0𝑑superscript𝑥\nabla\psi_{x_{0},d}(x^{*})∇ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), we have

ψx0,d(x)normsubscript𝜓subscript𝑥0𝑑superscript𝑥\displaystyle\|\nabla\psi_{x_{0},d}(x^{*})\|∥ ∇ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ =Tx0,d(x)+(t(x0)||xx0||d)|x\displaystyle=\left\|\nabla T_{x_{0},d}(x^{*})+\nabla(t(x_{0})||x-x_{0}||^{d^{% \prime}})\bigg{|}_{x^{*}}\right\|= ∥ ∇ italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ∇ ( italic_t ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) | | italic_x - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) | start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥
=Tx0,d(x)+t(x0)dxx0d2(xx0)absentnormsubscript𝑇subscript𝑥0𝑑superscript𝑥𝑡subscript𝑥0superscript𝑑superscriptnormsuperscript𝑥subscript𝑥0superscript𝑑2superscript𝑥subscript𝑥0\displaystyle=\left\|\nabla T_{x_{0},d}(x^{*})+t(x_{0})d^{\prime}||x^{*}-x_{0}% ||^{d^{\prime}-2}(x^{*}-x_{0})\right\|= ∥ ∇ italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + italic_t ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥
Tx0,d(x)+t(x0)dxx0d1.absentnormsubscript𝑇subscript𝑥0𝑑superscript𝑥𝑡subscript𝑥0superscript𝑑superscriptnormsuperscript𝑥subscript𝑥0superscript𝑑1\displaystyle\leq||\nabla T_{x_{0},d}(x^{*})||+t(x_{0})d^{\prime}||x^{*}-x_{0}% ||^{d^{\prime}-1}.≤ | | ∇ italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | | + italic_t ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT .

Applying Lemma 1 and noting that f(x)=0𝑓superscript𝑥0\nabla f(x^{*})=0∇ italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = 0, we have

ψx0,d(x)Ld!xx0d+t(x0)dxx0d1.normsubscript𝜓subscript𝑥0𝑑superscript𝑥𝐿𝑑superscriptnormsuperscript𝑥subscript𝑥0𝑑𝑡subscript𝑥0superscript𝑑superscriptnormsuperscript𝑥subscript𝑥0superscript𝑑1||\nabla\psi_{x_{0},d}(x^{*})||\leq\frac{L}{d!}||x^{*}-x_{0}||^{d}+t(x_{0})d^{% \prime}||x^{*}-x_{0}||^{d^{\prime}-1}.| | ∇ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | | ≤ divide start_ARG italic_L end_ARG start_ARG italic_d ! end_ARG | | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT + italic_t ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT .

Using Lemma 3 and the fact that xx0rnormsuperscript𝑥subscript𝑥0superscript𝑟||x^{*}-x_{0}||\leq r^{\prime}| | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | | ≤ italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we get

ψx0,d(x)normsubscript𝜓subscript𝑥0𝑑superscript𝑥\displaystyle||\nabla\psi_{x_{0},d}(x^{*})||| | ∇ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | | Ld!xx0d+(supxΩt(x))dmax{r,1}xx0dabsent𝐿𝑑superscriptnormsuperscript𝑥subscript𝑥0𝑑subscriptsupremum𝑥Ω𝑡𝑥superscript𝑑superscript𝑟1superscriptnormsuperscript𝑥subscript𝑥0𝑑\displaystyle\leq\frac{L}{d!}||x^{*}-x_{0}||^{d}+(\sup_{x\in\Omega}t(x))d^{% \prime}\max\{r^{\prime},1\}||x^{*}-x_{0}||^{d}≤ divide start_ARG italic_L end_ARG start_ARG italic_d ! end_ARG | | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT + ( roman_sup start_POSTSUBSCRIPT italic_x ∈ roman_Ω end_POSTSUBSCRIPT italic_t ( italic_x ) ) italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_max { italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , 1 } | | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT
=(Ld!+(supxΩt(x))dmax{r,1})xx0d.absent𝐿𝑑subscriptsupremum𝑥Ω𝑡𝑥superscript𝑑superscript𝑟1superscriptnormsuperscript𝑥subscript𝑥0𝑑\displaystyle=\left(\frac{L}{d!}+(\sup_{x\in\Omega}t(x))d^{\prime}\max\{r^{% \prime},1\}\right)||x^{*}-x_{0}||^{d}.= ( divide start_ARG italic_L end_ARG start_ARG italic_d ! end_ARG + ( roman_sup start_POSTSUBSCRIPT italic_x ∈ roman_Ω end_POSTSUBSCRIPT italic_t ( italic_x ) ) italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_max { italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , 1 } ) | | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .

Substituting into (8), we have

x1x(4((d2)21)λmin2f(x)(Ld!+(supxΩt(x))dmax{r,1}))xx0dnormsubscript𝑥1superscript𝑥4superscriptsuperscript𝑑221subscript𝜆superscript2𝑓superscript𝑥𝐿𝑑subscriptsupremum𝑥Ω𝑡𝑥superscript𝑑superscript𝑟1superscriptnormsuperscript𝑥subscript𝑥0𝑑||x_{1}-x^{*}||\leq\left(\frac{4((d^{\prime}-2)^{2}-1)}{\lambda_{\min}\nabla^{% 2}f(x^{*})}\left(\frac{L}{d!}+(\sup_{x\in\Omega}t(x))d^{\prime}\max\{r^{\prime% },1\}\right)\right)||x^{*}-x_{0}||^{d}| | italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | ≤ ( divide start_ARG 4 ( ( italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 2 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG ( divide start_ARG italic_L end_ARG start_ARG italic_d ! end_ARG + ( roman_sup start_POSTSUBSCRIPT italic_x ∈ roman_Ω end_POSTSUBSCRIPT italic_t ( italic_x ) ) italic_d start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_max { italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , 1 } ) ) | | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT

as desired. ∎

5 Numerical Examples

We present three examples to compare the performance of our d𝑑ditalic_dth-order Newton methods and the classical Newton method.

5.1 The Univariate Case

In the univariate case, the iterations of the classical Newton method read

xk+1=xkf(xk)f′′(xk).subscript𝑥𝑘1subscript𝑥𝑘superscript𝑓subscript𝑥𝑘superscript𝑓′′subscript𝑥𝑘x_{k+1}=x_{k}-\frac{f^{\prime}(x_{k})}{f^{\prime\prime}(x_{k})}.italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - divide start_ARG italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG .

In terms of finding a root of fsuperscript𝑓f^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, this iteration can be interpreted as first computing the first-order Taylor expansion of fsuperscript𝑓f^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT at xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and then finding the root of this affine function to define xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT.

We derive a similar explicit formula for our higher-order Newton method in the case where n=1𝑛1n=1italic_n = 1, d=3𝑑3d=3italic_d = 3, and f′′superscript𝑓′′f^{\prime\prime}italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT is positive. Since convex univariate polynomials are sos-convex, finding explicit solutions to the two SDPs involved in each iteration of our algorithm reduces to arguments about roots of univariate polynomials.

Proposition 1.

In the univariate case, when f′′(xk)>0superscript𝑓′′subscript𝑥𝑘0f^{\prime\prime}(x_{k})>0italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) > 0 and f′′′(xk)0superscript𝑓′′′subscript𝑥𝑘0f^{\prime\prime\prime}(x_{k})\neq 0italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≠ 0, the next iterate of the 3333rd-order version of Algorithm 1 is given by777Note that when f′′(xk)>0superscript𝑓′′subscript𝑥𝑘0f^{\prime\prime}(x_{k})>0italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) > 0 and f′′′(xk)=0superscript𝑓′′′subscript𝑥𝑘0f^{\prime\prime\prime}(x_{k})=0italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = 0, the third-order Taylor series is convex and coincides with the second-order Taylor series. Therefore, the next iterates of the third-order and the classical Newton method coincide.

xk+1=xk2f′′(xk)f′′′(xk)f(xk)23(f′′(xk))2f′′′(xk)(f′′′(xk))212f′′(xk)3.subscript𝑥𝑘1subscript𝑥𝑘2superscript𝑓′′subscript𝑥𝑘superscript𝑓′′′subscript𝑥𝑘3superscript𝑓subscript𝑥𝑘23superscriptsuperscript𝑓′′subscript𝑥𝑘2superscript𝑓′′′subscript𝑥𝑘superscriptsuperscript𝑓′′′subscript𝑥𝑘212superscript𝑓′′subscript𝑥𝑘x_{k+1}=x_{k}-2\frac{f^{\prime\prime}(x_{k})}{f^{\prime\prime\prime}(x_{k})}-% \sqrt[3]{\frac{f^{\prime}(x_{k})-\frac{2}{3}\frac{(f^{\prime\prime}(x_{k}))^{2% }}{f^{\prime\prime\prime}(x_{k})}}{\frac{(f^{\prime\prime\prime}(x_{k}))^{2}}{% 12f^{\prime\prime}(x_{k})}}}.italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 2 divide start_ARG italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG - nth-root start_ARG 3 end_ARG start_ARG divide start_ARG italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - divide start_ARG 2 end_ARG start_ARG 3 end_ARG divide start_ARG ( italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG end_ARG start_ARG divide start_ARG ( italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 12 italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG end_ARG end_ARG .
Proof.

To simplify notation, we let T:=Txk,3T\mathrel{\mathop{:}}=T_{x_{k},3}italic_T : = italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , 3 end_POSTSUBSCRIPT and ψ:=ψxk,3\psi\mathrel{\mathop{:}}=\psi_{x_{k},3}italic_ψ : = italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , 3 end_POSTSUBSCRIPT. By translation, we may assume xk=0subscript𝑥𝑘0x_{k}=0italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0. Then T(x)=f(xk)+xf(xk)+12x2f′′(xk)+16x3f′′′(xk)𝑇𝑥𝑓subscript𝑥𝑘𝑥superscript𝑓subscript𝑥𝑘12superscript𝑥2superscript𝑓′′subscript𝑥𝑘16superscript𝑥3superscript𝑓′′′subscript𝑥𝑘T(x)=f(x_{k})+xf^{\prime}(x_{k})+\frac{1}{2}x^{2}f^{\prime\prime}(x_{k})+\frac% {1}{6}x^{3}f^{\prime\prime\prime}(x_{k})italic_T ( italic_x ) = italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_x italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 6 end_ARG italic_x start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), ψ(x)=T(x)+tx4𝜓𝑥𝑇𝑥𝑡superscript𝑥4\psi(x)=T(x)+tx^{4}italic_ψ ( italic_x ) = italic_T ( italic_x ) + italic_t italic_x start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, where t𝑡titalic_t is the smallest constant that makes ψ𝜓\psiitalic_ψ convex. We have ψ′′(x)=f′′(xk)+xf′′′(xk)+12tx2superscript𝜓′′𝑥superscript𝑓′′subscript𝑥𝑘𝑥superscript𝑓′′′subscript𝑥𝑘12𝑡superscript𝑥2\psi^{\prime\prime}(x)=f^{\prime\prime}(x_{k})+xf^{\prime\prime\prime}(x_{k})+% 12tx^{2}italic_ψ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x ) = italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_x italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + 12 italic_t italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The discriminant of ψ′′superscript𝜓′′\psi^{\prime\prime}italic_ψ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT is (f′′′(xk))248tf′′(xk)superscriptsuperscript𝑓′′′subscript𝑥𝑘248𝑡superscript𝑓′′subscript𝑥𝑘(f^{\prime\prime\prime}(x_{k}))^{2}-48tf^{\prime\prime}(x_{k})( italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 48 italic_t italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), which tells us that t=(f′′′(xk))248f′′(xk)𝑡superscriptsuperscript𝑓′′′subscript𝑥𝑘248superscript𝑓′′subscript𝑥𝑘t=\frac{(f^{\prime\prime\prime}(x_{k}))^{2}}{48f^{\prime\prime}(x_{k})}italic_t = divide start_ARG ( italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 48 italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG.

To find xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, we look for the root of ψsuperscript𝜓\psi^{\prime}italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. One can write the expression for ψsuperscript𝜓\psi^{\prime}italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in the following form:

ψ(x)=(f′′′(xk))212f′′(xk)(x+2f′′(xk)f′′′(xk))3+f(xk)23(f′′(xk))2f′′′(xk).superscript𝜓𝑥superscriptsuperscript𝑓′′′subscript𝑥𝑘212superscript𝑓′′subscript𝑥𝑘superscript𝑥2superscript𝑓′′subscript𝑥𝑘superscript𝑓′′′subscript𝑥𝑘3superscript𝑓subscript𝑥𝑘23superscriptsuperscript𝑓′′subscript𝑥𝑘2superscript𝑓′′′subscript𝑥𝑘\psi^{\prime}(x)=\frac{(f^{\prime\prime\prime}(x_{k}))^{2}}{12f^{\prime\prime}% (x_{k})}\left(x+2\frac{f^{\prime\prime}(x_{k})}{f^{\prime\prime\prime}(x_{k})}% \right)^{3}+f^{\prime}(x_{k})-\frac{2}{3}\frac{(f^{\prime\prime}(x_{k}))^{2}}{% f^{\prime\prime\prime}(x_{k})}.italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) = divide start_ARG ( italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 12 italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG ( italic_x + 2 divide start_ARG italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - divide start_ARG 2 end_ARG start_ARG 3 end_ARG divide start_ARG ( italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG .

Observe that a univariate cubic polynomial of the form a(xb)3+c𝑎superscript𝑥𝑏3𝑐a(x-b)^{3}+citalic_a ( italic_x - italic_b ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + italic_c, with a0𝑎0a\neq 0italic_a ≠ 0, has a unique root at x=bca3𝑥𝑏3𝑐𝑎x=b-\sqrt[3]{\frac{c}{a}}italic_x = italic_b - nth-root start_ARG 3 end_ARG start_ARG divide start_ARG italic_c end_ARG start_ARG italic_a end_ARG end_ARG. Therefore, after a translation back by xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, we have

xk+1=xk2f′′(xk)f′′′(xk)f(xk)23(f′′(xk))2f′′′(xk)(f′′′(xk))212f′′(xk)3.subscript𝑥𝑘1subscript𝑥𝑘2superscript𝑓′′subscript𝑥𝑘superscript𝑓′′′subscript𝑥𝑘3superscript𝑓subscript𝑥𝑘23superscriptsuperscript𝑓′′subscript𝑥𝑘2superscript𝑓′′′subscript𝑥𝑘superscriptsuperscript𝑓′′′subscript𝑥𝑘212superscript𝑓′′subscript𝑥𝑘x_{k+1}=x_{k}-2\frac{f^{\prime\prime}(x_{k})}{f^{\prime\prime\prime}(x_{k})}-% \sqrt[3]{\frac{f^{\prime}(x_{k})-\frac{2}{3}\frac{(f^{\prime\prime}(x_{k}))^{2% }}{f^{\prime\prime\prime}(x_{k})}}{\frac{(f^{\prime\prime\prime}(x_{k}))^{2}}{% 12f^{\prime\prime}(x_{k})}}}.italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - 2 divide start_ARG italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG - nth-root start_ARG 3 end_ARG start_ARG divide start_ARG italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - divide start_ARG 2 end_ARG start_ARG 3 end_ARG divide start_ARG ( italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG end_ARG start_ARG divide start_ARG ( italic_f start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 12 italic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG end_ARG end_ARG .

As in the case of the classical Newton method, the expression in Proposition 1 can be interpreted geometrically in terms of finding a root of fsuperscript𝑓f^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. This iteration computes the second-order Taylor expansion of fsuperscript𝑓f^{\prime}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT at xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, adds a sufficiently large cubic term to enforce monotonicity, and then finds the root of this monotone cubic function to define xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT.

Example 1

In this example, we apply our method to the univariate function

f(x)=x2+11.𝑓𝑥superscript𝑥211f(x)=\sqrt{x^{2}+1}-1.italic_f ( italic_x ) = square-root start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG - 1 . (9)

This is a strictly convex function with its unique minimizer at x=0superscript𝑥0x^{*}=0italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 0. One can check that the classical Newton method converges to this minimizer if and only if |x0|<1subscript𝑥01|x_{0}|<1| italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | < 1. Using Proposition 1, we can calculate the exact basin of convergence of our third-order Newton method to be (β,β)𝛽𝛽(-\beta,\beta)( - italic_β , italic_β ), where

β=13(11+1421691+9i473+1691+9i473)3.407.𝛽1311142316919𝑖47316919𝑖47similar-to3.407\beta=\sqrt{\frac{1}{3}\left(11+\frac{142}{\sqrt[3]{1691+9i\sqrt{47}}}+\sqrt[3% ]{1691+9i\sqrt{47}}\right)}\sim 3.407.italic_β = square-root start_ARG divide start_ARG 1 end_ARG start_ARG 3 end_ARG ( 11 + divide start_ARG 142 end_ARG start_ARG nth-root start_ARG 3 end_ARG start_ARG 1691 + 9 italic_i square-root start_ARG 47 end_ARG end_ARG end_ARG + nth-root start_ARG 3 end_ARG start_ARG 1691 + 9 italic_i square-root start_ARG 47 end_ARG end_ARG ) end_ARG ∼ 3.407 .

This is strictly larger than the basin of convergence of the classical method.

Figure 1 demonstrates the difference between one iteration of the classical and our third-order Newton method starting at the point x0=1.5subscript𝑥01.5x_{0}=1.5italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.5. We display the quadratic and quartic polynomials Tx0,2subscript𝑇subscript𝑥02T_{x_{0},2}italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , 2 end_POSTSUBSCRIPT and ψx0,3subscript𝜓subscript𝑥03\psi_{x_{0},3}italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , 3 end_POSTSUBSCRIPT. The minimizers of these polynomials are denoted by x1Newtonsuperscriptsubscript𝑥1Newtonx_{1}^{\textrm{Newton}}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT Newton end_POSTSUPERSCRIPT and x13ONsuperscriptsubscript𝑥13ONx_{1}^{\textrm{3ON}}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3ON end_POSTSUPERSCRIPT, which are respectively the next iterates of the classical and our third-order Newton method. Since the third-order Taylor expansion of f𝑓fitalic_f provides a more accurate approximation, we see that the next iterate of our method is closer to xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, while that of the classical Newton method moves farther away from xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

Refer to caption
Figure 1: A comparison of one iteration of the classical Newton method and our third-order Newton method applied to the function in (9) starting at x0=1.5subscript𝑥01.5x_{0}=1.5italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.5.

For our d𝑑ditalic_dth-order Newton methods with d>3𝑑3d>3italic_d > 3, we calculate the radii of convergence numerically. These radii increase with degree as the following table demonstrates:

Degree d𝑑ditalic_d Radius of Convergence
2 (Classical Newton) 1
3 similar-to\sim3.4
4 similar-to\sim4.5
5 similar-to\sim5.9

We can visualize the speed of convergence of the fifth-order method, for example, in Figure 2. In this figure, we plot the absolute value of |xkx|subscript𝑥𝑘superscript𝑥|x_{k}-x^{*}|| italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | starting at x0=5.9subscript𝑥05.9x_{0}=5.9italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 5.9, which is close to the boundary of the basin. In just five iterations, the method reaches a point with absolute value approximately 1015superscript101510^{-15}10 start_POSTSUPERSCRIPT - 15 end_POSTSUPERSCRIPT.

Refer to caption
Figure 2: 5th-order Newton iterates applied to the function in (9).

Example 2

In this example, we compare our third-order method to the classical Newton method when applied to the function

f(x)=2xarctan(x)log(1+x2)+110x2.𝑓𝑥2𝑥𝑥1superscript𝑥2110superscript𝑥2f(x)=2x\arctan(x)-\log(1+x^{2})+\frac{1}{10}x^{2}.italic_f ( italic_x ) = 2 italic_x roman_arctan ( italic_x ) - roman_log ( 1 + italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 10 end_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (10)

This is a strongly convex function with its unique minimizer at x=0superscript𝑥0x^{*}=0italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 0.

In Figure 3, N2subscript𝑁2N_{2}italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (resp. N3subscript𝑁3N_{3}italic_N start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT) is the map that takes a point to the corresponding next iterate of the classical (resp. third-order) Newton method. In this example, the third-order method satisfies |N3(x)|<|x|subscript𝑁3𝑥𝑥|N_{3}(x)|<|x|| italic_N start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_x ) | < | italic_x | for all nonzero x𝑥xitalic_x, implying global convergence of the method. Meanwhile, the classical Newton method oscillates between ±13.494plus-or-minus13.494\pm 13.494± 13.494 when x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is outside of the range [α,α]𝛼𝛼[-\alpha,\alpha][ - italic_α , italic_α ], where α1.712similar-to𝛼1.712\alpha\sim 1.712italic_α ∼ 1.712 is point of intersection of the functions N2subscript𝑁2N_{2}italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and x𝑥-x- italic_x.

In Figure 4, we can see a comparison of the iterates of the third-order and the classical Newton method starting from the initial condition x0=1.7subscript𝑥01.7x_{0}=1.7italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1.7. While both methods converge to the minimizer, the third-order method converges much faster.

Refer to caption
(a)
Refer to caption
(b)
Figure 3: Comparison of the classical Newton map N2subscript𝑁2N_{2}italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and our third-order Newton map N3subscript𝑁3N_{3}italic_N start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT applied to the function in (10). Subfigure (a) implies that the third-order method is globally convergent, while the classical method is not. Subfigure (b) zooms in on the behavior of these maps near the origin to show that the basin of attraction for the classical method is approximately (1.712,1.712)1.7121.712(-1.712,1.712)( - 1.712 , 1.712 ).
Refer to caption
Figure 4: Iterates of our third-order and the classical Newton method applied to the function in (10) starting from a point in the basin of attraction of both methods.

5.2 A Multivariate Example

In our last example, we compare the classical and the third-order Newton methods applied to a standard test function in nonlinear optimization called the Beale function:

f(x1,x2)=(1.5x1+x1x2)2+(2.25x1+x1x22)2+(2.625x1+x1x23)2.𝑓subscript𝑥1subscript𝑥2superscript1.5subscript𝑥1subscript𝑥1subscript𝑥22superscript2.25subscript𝑥1subscript𝑥1superscriptsubscript𝑥222superscript2.625subscript𝑥1subscript𝑥1superscriptsubscript𝑥232f(x_{1},x_{2})=(1.5-x_{1}+x_{1}x_{2})^{2}+(2.25-x_{1}+x_{1}x_{2}^{2})^{2}+(2.6% 25-x_{1}+x_{1}x_{2}^{3})^{2}.italic_f ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ( 1.5 - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 2.25 - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 2.625 - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

This nonconvex function has a single global minimum at x=(3,0.5)Tsuperscript𝑥superscript30.5𝑇x^{*}=(3,0.5)^{T}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( 3 , 0.5 ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and no other local minima. In Figure 5, we explore the behavior of both methods with initial conditions in the region {x2x4}conditional-set𝑥superscript2subscriptnorm𝑥4\{x\in\mathbb{R}^{2}\mid\|x\|_{\infty}\leq 4\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∣ ∥ italic_x ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ 4 }. We initialize the classical method and our third-order method at a fine grid of points in this box and run both methods for 350350350350 iterations. For our third-order method, we take the parameter ε𝜀\varepsilonitalic_ε in Algorithm 1 to be equal to 0.010.010.010.01. In Figure 5, the color yellow corresponds to initial points that converge to xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, and the color blue corresponds to any other behavior including divergence or convergence to a point which is not a local minimum. In this example, the two basins are incomparable, but that of the third-order method is more contiguous and larger in volume.

Refer to caption
(a) classical Newton
Refer to caption
(b) Third-order Newton
Figure 5: The basins of attraction for the classical and the third-order Newton methods for the minimizer of the Beale function. The basin for the classical method has fractal structure, demonstrating more sensitivity to initialization.

6 Global convergence

In this section, we present a slightly modified algorithm which has global convergence under additional assumptions. There is a vast literature on modifications to Newton’s method that lead to global convergence in special circumstances: see, e.g., [41, 43, 38, 22]. In the setting of our work, it turns out that we can use a result of Nesterov from [40] to show that a simple modification to our algorithm that still has polynomial work per iteration is globally convergent when the Taylor expansion is made to an odd order.888The reason we need the Taylor expansion order to be odd is that in the work of Nesterov, the Taylor polynomial is regularized by a term of degree one larger. We need this new term to be a polynomial function for sum of squares methods to be readily applicable. This modified algorithm (Algorithm 2 below) also inherits the local convergence order of Algorithm 1.

As in [40], suppose the d𝑑ditalic_dth derivative of the function f:n:𝑓maps-tosuperscript𝑛f:\mathbb{R}^{n}\mapsto\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ↦ blackboard_R that we wish to minimize has a Lipschitz constant Ldsubscript𝐿𝑑L_{d}italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, and that an upper bound M𝑀Mitalic_M on Ldsubscript𝐿𝑑L_{d}italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT is known. In this setting, consider the following algorithm:

Input : x0nsubscript𝑥0superscript𝑛x_{0}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT
1 for k=0,𝑘0italic-…k=0,\dotsitalic_k = 0 , italic_… do
2       Solve (3) to find t(xk)𝑡subscript𝑥𝑘t(x_{k})italic_t ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
3       Let xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT be the minimizer of Txk,d(x)+max{dM(d+1)!,t(xk)}xxkd+1subscript𝑇subscript𝑥𝑘𝑑𝑥𝑑𝑀𝑑1𝑡subscript𝑥𝑘superscriptnorm𝑥subscript𝑥𝑘𝑑1T_{x_{k},d}(x)+\max\{\frac{dM}{(d+1)!},t(x_{k})\}\|x-x_{k}\|^{d+1}italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x ) + roman_max { divide start_ARG italic_d italic_M end_ARG start_ARG ( italic_d + 1 ) ! end_ARG , italic_t ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) } ∥ italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT
4      
5 end for
Algorithm 2 d𝑑ditalic_dth-order globally convergent Newton method (d𝑑ditalic_d odd)

Using the same arguments as those in the proof of Theorem 3, one can see that the next iterate xk+1subscript𝑥𝑘1x_{k+1}italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT produced by this algorithm is well-defined whenever 2f(xk)0succeedssuperscript2𝑓subscript𝑥𝑘0\nabla^{2}f(x_{k})\succ 0∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≻ 0. Also as before, problem (3) can be solved as a semidefinite program of size polynomial in the dimension. This claim also holds for the problem of finding the (unique) minimizer of the degree d+1𝑑1d+1italic_d + 1 polynomial

Txk,d+max{dM(d+1)!,t(xk)}xxkd+1.subscript𝑇subscript𝑥𝑘𝑑𝑑𝑀𝑑1𝑡subscript𝑥𝑘superscriptnorm𝑥subscript𝑥𝑘𝑑1T_{x_{k},d}+\max\{\frac{dM}{(d+1)!},t(x_{k})\}\|x-x_{k}\|^{d+1}.italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT + roman_max { divide start_ARG italic_d italic_M end_ARG start_ARG ( italic_d + 1 ) ! end_ARG , italic_t ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) } ∥ italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT .

This is because the polynomials xxkd+1superscriptnorm𝑥subscript𝑥𝑘𝑑1\|x-x_{k}\|^{d+1}∥ italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT and Txk,d+t(xk)xxkd+1subscript𝑇subscript𝑥𝑘𝑑𝑡subscript𝑥𝑘superscriptnorm𝑥subscript𝑥𝑘𝑑1T_{x_{k},d}+t(x_{k})\|x-x_{k}\|^{d+1}italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT + italic_t ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT are sos-convex and a conic combination of two sos-convex polynomials is sos-convex, making Theorem 2 applicable.

Theorem 5.

Suppose f:n:𝑓superscript𝑛f:\mathbb{R}^{n}\rightarrow\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R has bounded level sets, a positive definite Hessian everywhere, and the Lipschitz constant of its d𝑑ditalic_dth derivative bounded above by M𝑀Mitalic_M.999The assumptions that we make here are the same as those in [40] except that our assumption of positive definiteness of the Hessian is stronger than the assumption of positive semidefiniteness of the Hessian made in [40]. Then, the iterates of Algorithm 2 starting from any x0nsubscript𝑥0superscript𝑛x_{0}\in\leavevmode\nobreak\ \mathbb{R}^{n}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT converge to the (unique) minimizer of f𝑓fitalic_f. Furthermore, Algorithm 2 has local convergence rate of order d𝑑ditalic_d.

Proof.

Since the Hessian of f𝑓fitalic_f is positive definite everywhere, the function f𝑓fitalic_f is strictly convex. This, along with boundedness of the level sets, implies that f𝑓fitalic_f has a unique (global) minimizer which we call xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

Define ψxk,d(x):=Txk,d(x)+max{dM(d+1)!,t(xk)}xxkd+1\psi_{x_{k},d}(x)\mathrel{\mathop{:}}=T_{x_{k},d}(x)+\max\{\frac{dM}{(d+1)!},t% (x_{k})\}\|x-x_{k}\|^{d+1}italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x ) : = italic_T start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x ) + roman_max { divide start_ARG italic_d italic_M end_ARG start_ARG ( italic_d + 1 ) ! end_ARG , italic_t ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) } ∥ italic_x - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT. By Theorem 1 from [40], we have ψxk,d(x)f(x)subscript𝜓subscript𝑥𝑘𝑑𝑥𝑓𝑥\psi_{x_{k},d}(x)\geq f(x)italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x ) ≥ italic_f ( italic_x ) for all xn𝑥superscript𝑛x\in\mathbb{R}^{n}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, thus the method is monotone; i.e., f(xk+1)f(xk)𝑓subscript𝑥𝑘1𝑓subscript𝑥𝑘f(x_{k+1})\leq f(x_{k})italic_f ( italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) ≤ italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). Let Mk:=max{M,(d+1)!t(xk)d}M_{k}\mathrel{\mathop{:}}=\max\left\{M,\frac{(d+1)!t(x_{k})}{d}\right\}italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT : = roman_max { italic_M , divide start_ARG ( italic_d + 1 ) ! italic_t ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_d end_ARG } and δk:=f(xk)f(x)\delta_{k}\mathrel{\mathop{:}}=f(x_{k})-f(x^{*})italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT : = italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). Since the set {xnf(x)f(x0)}conditional-set𝑥superscript𝑛𝑓𝑥𝑓subscript𝑥0\{x\in\mathbb{R}^{n}\mid f(x)\leq f(x_{0})\}{ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∣ italic_f ( italic_x ) ≤ italic_f ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) } is compact and the method is monotone, there exists a scalar D𝐷Ditalic_D such that xkxDnormsubscript𝑥𝑘superscript𝑥𝐷\|x_{k}-x^{*}\|\leq D∥ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_D for all k𝑘kitalic_k. By the arguments in the proof of Theorem 2 from [40], we can conclude that

δkδk+1Ckδkd+1d,subscript𝛿𝑘subscript𝛿𝑘1subscript𝐶𝑘superscriptsubscript𝛿𝑘𝑑1𝑑\delta_{k}-\delta_{k+1}\geq C_{k}\delta_{k}^{\frac{d+1}{d}},italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ≥ italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG italic_d + 1 end_ARG start_ARG italic_d end_ARG end_POSTSUPERSCRIPT ,

where Ck:=dd+1(d!(dMk+Ld)Dd+1)1dC_{k}\mathrel{\mathop{:}}=\frac{d}{d+1}\left(\frac{d!}{(dM_{k}+L_{d})D^{d+1}}% \right)^{\frac{1}{d}}italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT : = divide start_ARG italic_d end_ARG start_ARG italic_d + 1 end_ARG ( divide start_ARG italic_d ! end_ARG start_ARG ( italic_d italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) italic_D start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_d end_ARG end_POSTSUPERSCRIPT.

By Lemma 3, we know that

tmax:=supxxDt(x)t_{\max}\mathrel{\mathop{:}}=\sup_{\|x-x^{*}\|\leq D}t(x)italic_t start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT : = roman_sup start_POSTSUBSCRIPT ∥ italic_x - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ ≤ italic_D end_POSTSUBSCRIPT italic_t ( italic_x )

is finite. Letting Mmax:=max{M,(d+1)!tmaxd}M_{\max}\mathrel{\mathop{:}}=\max\{M,\frac{(d+1)!t_{\max}}{d}\}italic_M start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT : = roman_max { italic_M , divide start_ARG ( italic_d + 1 ) ! italic_t start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG start_ARG italic_d end_ARG }, we have MkMmaxsubscript𝑀𝑘subscript𝑀M_{k}\leq M_{\max}italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ italic_M start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT, and therefore Ckdd+1(d!(dMmax+Ld)Dd+1)1dsubscript𝐶𝑘𝑑𝑑1superscript𝑑𝑑subscript𝑀subscript𝐿𝑑superscript𝐷𝑑11𝑑C_{k}\geq\frac{d}{d+1}\left(\frac{d!}{(dM_{\max}+L_{d})D^{d+1}}\right)^{\frac{% 1}{d}}italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≥ divide start_ARG italic_d end_ARG start_ARG italic_d + 1 end_ARG ( divide start_ARG italic_d ! end_ARG start_ARG ( italic_d italic_M start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) italic_D start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_d end_ARG end_POSTSUPERSCRIPT for all k𝑘kitalic_k. Continuing the argument from the proof of Theorem 2 from [40], we can conclude that

f(xk)f(x)(dMmax+Ld)Dd+1d!(d+1k)d.𝑓subscript𝑥𝑘𝑓superscript𝑥𝑑subscript𝑀subscript𝐿𝑑superscript𝐷𝑑1𝑑superscript𝑑1𝑘𝑑f(x_{k})-f(x^{*})\leq\frac{(dM_{\max}+L_{d})D^{d+1}}{d!}\left(\frac{d+1}{k}% \right)^{d}.italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ divide start_ARG ( italic_d italic_M start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) italic_D start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_d ! end_ARG ( divide start_ARG italic_d + 1 end_ARG start_ARG italic_k end_ARG ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .

Thus, we have f(xk)f(x)0𝑓subscript𝑥𝑘𝑓superscript𝑥0f(x_{k})-f(x^{*})\rightarrow 0italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) → 0 and therefore xkxsubscript𝑥𝑘superscript𝑥x_{k}\rightarrow x^{*}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

For the local superlinear convergence rate, it suffices to show that for xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT close enough to xsuperscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, we have

xk+1xcxkxdnormsubscript𝑥𝑘1superscript𝑥superscript𝑐superscriptnormsubscript𝑥𝑘superscript𝑥𝑑||x_{k+1}-x^{*}||\leq c^{\prime}||x_{k}-x^{*}||^{d}| | italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | ≤ italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT

for some constant csuperscript𝑐c^{\prime}italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Let r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT be as in the proof of Theorem 4, r:=min{r1,r2},r^{\prime}\mathrel{\mathop{:}}=\leavevmode\nobreak\ \min\{r_{1},r_{2}\},italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : = roman_min { italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT } , and Ω:={xn||xx||r}\Omega\mathrel{\mathop{:}}=\{x\in\mathbb{R}^{n}\mid||x-x^{*}||\leq r^{\prime}\}roman_Ω : = { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∣ | | italic_x - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | ≤ italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT }. By the arguments in the proof of Theorem 4, for every xkΩsubscript𝑥𝑘Ωx_{k}\in\Omegaitalic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ roman_Ω, we have

ψxk,d(x)normsubscript𝜓subscript𝑥𝑘𝑑superscript𝑥\displaystyle||\nabla\psi_{x_{k},d}(x^{*})||| | ∇ italic_ψ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | | Ldd!xxkd+max{dM(d+1)!,t(xk)}(d+1)xxkdabsentsubscript𝐿𝑑𝑑superscriptnormsuperscript𝑥subscript𝑥𝑘𝑑𝑑𝑀𝑑1𝑡subscript𝑥𝑘𝑑1superscriptnormsuperscript𝑥subscript𝑥𝑘𝑑\displaystyle\leq\frac{L_{d}}{d!}||x^{*}-x_{k}||^{d}+\max\left\{\frac{dM}{(d+1% )!},t(x_{k})\right\}(d+1)||x^{*}-x_{k}||^{d}≤ divide start_ARG italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_d ! end_ARG | | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT + roman_max { divide start_ARG italic_d italic_M end_ARG start_ARG ( italic_d + 1 ) ! end_ARG , italic_t ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) } ( italic_d + 1 ) | | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT
Ldd!xxkd+max{dM(d+1)!,supxΩt(x)}(d+1)xxkd.absentsubscript𝐿𝑑𝑑superscriptnormsuperscript𝑥subscript𝑥𝑘𝑑𝑑𝑀𝑑1subscriptsupremum𝑥Ω𝑡𝑥𝑑1superscriptnormsuperscript𝑥subscript𝑥𝑘𝑑\displaystyle\leq\frac{L_{d}}{d!}||x^{*}-x_{k}||^{d}+\max\left\{\frac{dM}{(d+1% )!},\sup_{x\in\Omega}t(x)\right\}(d+1)||x^{*}-x_{k}||^{d}.≤ divide start_ARG italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_d ! end_ARG | | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT + roman_max { divide start_ARG italic_d italic_M end_ARG start_ARG ( italic_d + 1 ) ! end_ARG , roman_sup start_POSTSUBSCRIPT italic_x ∈ roman_Ω end_POSTSUBSCRIPT italic_t ( italic_x ) } ( italic_d + 1 ) | | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .

Substituting into (8) (with x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT replaced with xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT), we have

xk+1xcxxkd,normsubscript𝑥𝑘1superscript𝑥superscript𝑐superscriptnormsuperscript𝑥subscript𝑥𝑘𝑑||x_{k+1}-x^{*}||\leq c^{\prime}||x^{*}-x_{k}||^{d},| | italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | ≤ italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | | italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ,

where

c:=4((d1)21)λmin2f(x)(Ldd!+max{dM(d+1)!,supxΩt(x)}(d+1)).c^{\prime}\mathrel{\mathop{:}}=\frac{4((d-1)^{2}-1)}{\lambda_{\min}\nabla^{2}f% (x^{*})}\left(\frac{L_{d}}{d!}+\max\left\{\frac{dM}{(d+1)!},\sup_{x\in\Omega}t% (x)\right\}(d+1)\right).italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : = divide start_ARG 4 ( ( italic_d - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_ARG ( divide start_ARG italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_d ! end_ARG + roman_max { divide start_ARG italic_d italic_M end_ARG start_ARG ( italic_d + 1 ) ! end_ARG , roman_sup start_POSTSUBSCRIPT italic_x ∈ roman_Ω end_POSTSUBSCRIPT italic_t ( italic_x ) } ( italic_d + 1 ) ) .

We note that by Lemma 3, supxΩt(x)subscriptsupremum𝑥Ω𝑡𝑥\sup_{x\in\Omega}t(x)roman_sup start_POSTSUBSCRIPT italic_x ∈ roman_Ω end_POSTSUBSCRIPT italic_t ( italic_x ) is finite. ∎

7 Future directions

Besides the question of extending the results of Section 6 to the case of d𝑑ditalic_d even, there are a few other potential directions for future research that we wish to highlight:

  • Can we replace the SDPs used in Algorithm 1 with more scalable conic programs such as linear programs (LPs) or second-order cone programs (SOCPs)? There has been work (see, e.g., [4]) on replacing methods based on sos programming with LP or SOCP-based approaches that rely on more tractable subsets of sos polynomials, such as the so-called diagonally dominant sum of squares (dsos) or scaled diagonally dominant sum of squares (sdsos) polynomials. In our setting, we might wish to replace the constraint in (3) (or (5)) that a polynomial is sos-convex with a constraint that it is “dsos-convex” or “sdsos-convex” (see, e.g., [3]). The results in [3] on the difference of dsos-convex decompositions of arbitrary polynomials could be explored to potentially replace the first SDP in each iteration of Algorithm 1 with an LP or SOCP. One would then need to establish an appropriate dsos or sdsos version of Theorem 2 to replace our second SDP with an LP or SOCP. It would be interesting to compare the factor of convergence of such an algorithm to that of the SDP-based approach.

  • Can we create a method that uses a sparse subset of higher-order derivatives of the function f𝑓fitalic_f and that perhaps approximates the remaining derivatives in order to speed up each iteration? Such a method would be a higher-order analogue to the so-called “quasi-Newton” methods which rely on approximations of the Hessian of f𝑓fitalic_f (see, e.g., [42, Chap. 6]). An example of such a higher-order quasi-Newton method which results in semidefinite programs of small size in each iteration has been proposed in [2], but its convergence properties are currently unknown.

  • Can we use our method or a modification thereof to solve systems of nonlinear equations (in a way that is superior to simply minimizing the sum of the squares of the equations)? The classical Newton method and its variants can be used for this purpose (see, e.g., [42, Sect. 11.1]). What are the right higher-order analogues of these approaches?

  • Each iteration of the algorithms that we have presented in this paper can be interpreted as running just one iteration of the so-called “convex-concave procedure” (see, e.g., [34]) to a particular difference of convex decomposition of the Taylor expansion of f𝑓fitalic_f. Are there benefits of working with alternative difference of convex decompositions (see, e.g., [3]) of the Taylor expansion, or running more iterations of the convex-concave procedure before the Taylor polynomial is updated?

Acknowledgements

We would like to thank Jean-Bernard Lasserre for insightful discussions around the results in [32].

References

  • [1] N. Agarwal and E. Hazan. Lower bounds for higher-order convex optimization. In Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 774–792, 2018.
  • [2] A. A. Ahmadi, C. Dibek, and G. Hall. Sums of separable and quadratic polynomials. Mathematics of Operations Research, 48, 2022.
  • [3] A. A. Ahmadi and G. Hall. DC decomposition of nonconvex polynomials with algebraic techniques. Mathematical Programming, 169(1):69–94, 2018.
  • [4] A. A. Ahmadi and A. Majumdar. DSOS and SDSOS optimization: More tractable alternatives to sum of squares and semidefinite optimization. SIAM Journal on Applied Algebra and Geometry, 3(2):193–230, 2019.
  • [5] A. A. Ahmadi, A. Olshevsky, P. A. Parrilo, and J. N. Tsitsiklis. NP-hardness of deciding convexity of quartic polynomials and related problems. Mathematical Programming, 137:453–476, 2013.
  • [6] A. A. Ahmadi and P. A. Parrilo. A complete characterization of the gap between convexity and sos-convexity. SIAM Journal on Optimization, 23(2):811–833, 2013.
  • [7] A. A. Ahmadi and J. Zhang. Complexity aspects of local minima and related notions. Advances in Mathematics, 397:108119, 2022.
  • [8] A. A. Ahmadi and J. Zhang. On the complexity of finding a local minimizer of a quadratic function over a polytope. Mathematical Programming, 195(1-2):783–792, 2022.
  • [9] M. Baes. Estimate sequence methods: extensions and approximations. Institute for Operations Research, ETH, Zürich, Switzerland, 2(1), 2009.
  • [10] E. G. Belousov and D. Klatte. A Frank–Wolfe type theorem for convex polynomial programs. Computational Optimization and Applications, 22(1):37–48, 2002.
  • [11] E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos, and P. L. Toint. Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. Mathematical Programming, 163(1):359–368, 2017.
  • [12] S. Bubeck, Q. Jiang, Y. T. Lee, Y. Li, and A. Sidford. Near-optimal method for highly smooth convex optimization. In Conference on Learning Theory, pages 492–507. Proceedings of Machine Learning Research, 2019.
  • [13] Y. Carmon, J. C. Duchi, O. Hinder, and A. Sidford. Lower bounds for finding stationary points i. Mathematical Programming, 184(1):71–120, 2020.
  • [14] C. Cartis, N. I. Gould, and P. L. Toint. Universal regularization methods: varying the power, the smoothness and the accuracy. SIAM Journal on Optimization, 29(1):595–615, 2019.
  • [15] C. Cartis, N. I. Gould, and P. L. Toint. A concise second-order complexity analysis for unconstrained optimization using high-order regularized models. Optimization Methods and Software, 35(2):243–256, 2020.
  • [16] C. Cartis, N. I. Gould, and P. L. Toint. Sharp worst-case evaluation complexity bounds for arbitrary-order nonconvex optimization with inexpensive constraints. SIAM Journal on Optimization, 30(1):513–541, 2020.
  • [17] C. Cartis, N. I. Gould, and P. L. Toint. Evaluation Complexity of Algorithms for Nonconvex Optimization: Theory, Computation and Perspectives. SIAM, 2022.
  • [18] C. Cartis and W. Zhu. Second-order methods for quartically-regularised cubic polynomials, with applications to high-order tensor methods. arXiv preprint arXiv:2308.15336, 2023.
  • [19] C. Cartis and W. Zhu. Global convergence of high-order regularization methods with sums-of-squares Taylor models. arXiv preprint arXiv:2404.03035, 2024.
  • [20] P. L. Chebyshev. Polnoe Sobranie Sochinenii. Izd. Akad. Nauk SSSR, 5:7–25, 1951.
  • [21] C. W. Clenshaw and A. R. Curtis. A method for numerical integration on an automatic computer. Numerische Mathematik, 2(1):197–205, 1960.
  • [22] A. Conn, N. Gould, and P. Toint. Trust Region Methods. MPS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics, 2000.
  • [23] N. Doikov. New second-order and tensor methods in convex optimization. PhD thesis, Université catholique de Louvain, 2021.
  • [24] N. Doikov and Y. Nesterov. Local convergence of tensor methods. Mathematical Programming, 193(1):315–336, 2022.
  • [25] G. N. Grapiglia and Y. Nesterov. Tensor methods for finding approximate stationary points of convex functions. Optimization Methods and Software, 37(2):605–638, 2022.
  • [26] J. W. Helton and J. Nie. Semidefinite representation of convex sets. Mathematical Programming, 122:21–64, 2010.
  • [27] J. P. Imhof. On the method for numerical integration of Clenshaw and Curtis. Numerische Mathematik, 5(1):138–141, 1963.
  • [28] B. Jiang, T. Lin, and S. Zhang. A unified adaptive tensor approximation scheme to accelerate composite convex optimization. SIAM Journal on Optimization, 30(4):2897–2926, 2020.
  • [29] B. Jiang, H. Wang, and S. Zhang. An optimal high-order tensor method for convex optimization. Mathematics of Operations Research, 46(4):1390–1412, 2021.
  • [30] J.-B. Lasserre. Global optimization with polynomials and the problem of moments. SIAM Journal on Optimization, 11:796–817, 2000.
  • [31] J.-B. Lasserre. Representation of nonnegative convex polynomials. Archiv der Mathematik, 91(2):126–130, 2008.
  • [32] J.-B. Lasserre. Convexity in semialgebraic geometry and polynomial optimization. SIAM Journal on Optimization, 19:1995–2014, 2009.
  • [33] K. Levenberg. Method for the solution of certain problems in least squares. J Numer Anal, 16:588–A604, 1944.
  • [34] T. Lipp and S. Boyd. Variations and extension of the convex–concave procedure. Optimization and Engineering, 17(2):263–287, 2016.
  • [35] J. Löfberg. YALMIP: A toolbox for modeling and optimization in MATLAB. In IEEE International Conference on Robotics and Automation, pages 284–289, 2004.
  • [36] A. Majumdar, G. Hall, and A. A. Ahmadi. Recent scalability improvements for semidefinite programming with applications in machine learning, control, and robotics. Annual Review of Control, Robotics, and Autonomous Systems, 3:331–360, 2020.
  • [37] D. W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics, 11(2):431–441, 1963.
  • [38] J. J. Moré. Recent Developments in Algorithms and Software for Trust Region Methods, pages 258–287. Springer Berlin Heidelberg, 1983.
  • [39] K. G. Murty and S. N. Kabadi. Some NP-complete problems in quadratic and nonlinear programming. Mathematical Programming, 39(2):117–129, 1987.
  • [40] Y. Nesterov. Implementable tensor methods in unconstrained convex optimization. Mathematical Programming, 186(1):157–183, 2021.
  • [41] Y. Nesterov and B. T. Polyak. Cubic regularization of Newton method and its global performance. Mathematical Programming, 108(1):177–205, 2006.
  • [42] J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 2006.
  • [43] J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables. SIAM, 2000.
  • [44] P. A. Parrilo. Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. PhD thesis, California Institute of Technology, 2000.
  • [45] S. Prajna, A. Papachristodoulou, and P. A. Parrilo. Introducing SOSTOOLS: A general purpose sum of squares programming solver. In Proceedings of the 41st IEEE Conference on Decision and Control, volume 1, pages 741–746, 2002.
  • [46] I. Pólik and T. Terlaky. A survey of the S-lemma. SIAM Review, 49(3):371–418, 2007.
  • [47] O. Silina and J. Zhang. An unregularized third order Newton method. arXiv preprint arXiv:2209.10051, 2022.
  • [48] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Review, 38(1):49–95, 1996.
  • [49] A. Yurtsever, J. A. Tropp, O. Fercoq, M. Udell, and V. Cevher. Scalable semidefinite programming. SIAM Journal on Mathematics of Data Science, 3(1):171–200, 2021.