License: CC BY 4.0
arXiv:2403.06930v1 [math.OC] 11 Mar 2024

Heavy Ball Momentum for Non-Strongly Convex Optimization

J.-F. Aujol111Univ. Bordeaux, Bordeaux INP, CNRS, IMB, UMR 5251, F-33400 Talence, France    C. Dossal222IMT, Univ. Toulouse, INSA Toulouse, Toulouse, France    H. Labarrière333MaLGa, DIBRIS, Università di Genova, Genoa, Italy    A. Rondepierre22footnotemark: 2 444LAAS, Univ. Toulouse, CNRS, Toulouse, France
(March 11, 2024)
Abstract

When considering the minimization of a quadratic or strongly convex function, it is well known that first-order methods involving an inertial term weighted by a constant-in-time parameter are particularly efficient (see Polyak [32], Nesterov [28], and references therein). By setting the inertial parameter according to the condition number of the objective function, these methods guarantee a fast exponential decay of the error. We prove that this type of schemes (which are later called Heavy Ball schemes) is relevant in a relaxed setting, i.e. for composite functions satisfying a quadratic growth condition. In particular, we adapt V-FISTA, introduced by Beck in [10] for strongly convex functions, to this broader class of functions. To the authors’ knowledge, the resulting worst-case convergence rates are faster than any other in the literature, including those of FISTA restart schemes. No assumption on the set of minimizers is required and guarantees are also given in the non-optimal case, i.e. when the condition number is not exactly known. This analysis follows the study of the corresponding continuous-time dynamical system (Heavy Ball with friction system), for which new convergence results of the trajectory are shown.

1 Introduction

In many image processing or statistical problems, the optimization of a convex function F𝐹Fitalic_F from Nsuperscript𝑁\mathbb{R}^{N}blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT to {+}\mathbb{R}\cup\{+\infty\}blackboard_R ∪ { + ∞ } with a non empty set of minimizers may be needed. In this context, when N𝑁Nitalic_N is large (i.e. for large scale problems), second order algorithms cannot be used and only gradient or sub-gradient of F𝐹Fitalic_F can be computed to get a minimizing sequence (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT.

If F𝐹Fitalic_F is convex, differentiable and has a L𝐿Litalic_L-Lipschitz gradient, the explicit gradient descent algorithm (GD) with step s=1L𝑠1𝐿s=\frac{1}{L}italic_s = divide start_ARG 1 end_ARG start_ARG italic_L end_ARG defined by

xn+1=xnsF(xn)subscript𝑥𝑛1subscript𝑥𝑛𝑠𝐹subscript𝑥𝑛x_{n+1}=x_{n}-s\nabla F(x_{n})italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_s ∇ italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )

is a simple first order algorithm that provides a sequence converging to a minimizer x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT of F𝐹Fitalic_F. This method is actually slow on this class of convex functions since its asymptotic convergence rate is

F(xn)F(x*)=𝒪(n1).𝐹subscript𝑥𝑛𝐹superscript𝑥𝒪superscript𝑛1F(x_{n})-F(x^{*})=\mathcal{O}\left(n^{-1}\right).italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) = caligraphic_O ( italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) .

This decay rate can be improved when considering μlimit-from𝜇\mu-italic_μ -strongly convex functions, since the worst-case guarantee is then

F(xn)F(x*)=𝒪(eμLn).𝐹subscript𝑥𝑛𝐹superscript𝑥𝒪superscript𝑒𝜇𝐿𝑛F(x_{n})-F(x^{*})=\mathcal{O}\left(e^{-\frac{\mu}{L}n}\right).italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_μ end_ARG start_ARG italic_L end_ARG italic_n end_POSTSUPERSCRIPT ) . (1)

This asymptotic decay is faster that the one obtained for convex functions but when κ:=μL1assign𝜅𝜇𝐿much-less-than1\kappa:=\frac{\mu}{L}\ll 1italic_κ := divide start_ARG italic_μ end_ARG start_ARG italic_L end_ARG ≪ 1, this decay can still be slow in practice. As κ𝜅\kappaitalic_κ is the inverse of the condition number of F𝐹Fitalic_F, GD is particularly slow for large scale problems.
Two remarks can be made about these decays. First, if F𝐹Fitalic_F is not differentiable but composite, GD can be replaced by the Forward-Backward algorithm and the two decays above are still valid. We provide an exact definition of composite and Forward-Backward algorithm in Section 2. Second, the above exponential decay of the error is given under a strong convexity assumption but can be extended under weaker hypotheses such as a quadratic growth condition.

In 1964 Polyak introduces the Heavy Ball (HB) scheme inspired by mechanics, which improves the decay of gradient descent on the class of C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT strongly convex functions by incorporating inertia. This scheme generates a sequence of iterates (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT ensuring that:

F(xn)F(x*)=𝒪(e4κn).𝐹subscript𝑥𝑛𝐹superscript𝑥𝒪superscript𝑒4𝜅𝑛F(x_{n})-F(x^{*})=\mathcal{O}\left(e^{-4\sqrt{\kappa}n}\right).italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - 4 square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT ) . (2)

If κ1much-less-than𝜅1\kappa\ll 1italic_κ ≪ 1, this convergence rate is significantly faster than (1) guaranteed by the Forward-Backward algorithm. This theoretical improvement reflects a better performance in practice. At the core of the Polyak’s analysis is the fact that in the neighborhood of its unique minimizer, F𝐹Fitalic_F behaves like a quadratic function. But the C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT assumption is crucial in the Polyak’s analysis, and examples of simple C1superscript𝐶1C^{1}italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT strongly convex functions F𝐹Fitalic_F such that the (HB) provides diverging sequences can be found in [21].

In 1983 Nesterov [27] proposes an inertial scheme built to speed up the convergence of GD on the class of convex functions. This acceleration process is at the core of FISTA introduced by Beck and Teboulle [11], which applies to composite functions and provides a sequence such that

F(xn)F(x*)=𝒪(n2).𝐹subscript𝑥𝑛𝐹superscript𝑥𝒪superscript𝑛2F(x_{n})-F(x^{*})=\mathcal{O}\left(n^{-2}\right).italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) = caligraphic_O ( italic_n start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) .

The details of this algorithm and convergence rates are given in the Section 2. The main difference between the Heavy Ball algorithm and the Nesterov scheme is the inertia parameter which is constant over iterations and depends on κ:=μLassign𝜅𝜇𝐿\kappa:=\frac{\mu}{L}italic_κ := divide start_ARG italic_μ end_ARG start_ARG italic_L end_ARG for Heavy Ball while it depends on the iteration number and tends to 1 when n𝑛nitalic_n goes to ++\infty+ ∞ for FISTA.

Many variations of these schemes have been proposed during the last decade, see table 1 for various examples, and the behavior, rates and stability of these various schemes are now well understood. A common approach is to study an associated dynamical system via a Lyapunov analysis before deriving convergence results on the scheme, see e.g. [9] and the references therein.

Several Heavy Ball schemes have been proposed to provide fast decays of the type (2) under weaker hypotheses than C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and strong convexity [28, 10, 35, 37, 36]. But for all these schemes, a fast geometrical decay such as (2) is achieved only on classes of functions having a unique minimizer: no known inertial scheme achieves such rates on the class of convex functions satisfying a simple quadratic growth condition,

μ>0,xN,x*X*,F(x)F(x*)μ2d(x,X*)2,formulae-sequence𝜇0formulae-sequencefor-all𝑥superscript𝑁formulae-sequencefor-allsuperscript𝑥superscript𝑋𝐹𝑥𝐹superscript𝑥𝜇2𝑑superscript𝑥superscript𝑋2\exists\mu>0,\leavevmode\nobreak\ \forall x\in\mathbb{R}^{N},\,\forall x^{*}% \in X^{*},\quad F(x)-F(x^{*})\geqslant\frac{\mu}{2}d(x,X^{*})^{2},∃ italic_μ > 0 , ∀ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT , ∀ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_F ( italic_x ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ⩾ divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG italic_d ( italic_x , italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (3)

or equivalently in the convex setting a Łojasiewicz property with parameter θ=12𝜃12\theta=\frac{1}{2}italic_θ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG, without introducing additional uniqueness hypothesis. In others words, no known inertial scheme provides better asymptotic bounds compared to (GD) within the class of convex functions satisfying a quadratic growth condition. Thus, it remains unclear if inertia holds any real significance for this class of functions.

The main contribution of this work is to provide Heavy Ball schemes, similar to Beck’s V-FISTA, ensuring rates of O(ecκn)𝑂superscript𝑒𝑐𝜅𝑛O(e^{-c\sqrt{\kappa}n})italic_O ( italic_e start_POSTSUPERSCRIPT - italic_c square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT ) on the class of convex functions satisfying some quadratic growth condition, where the value of c𝑐citalic_c will be specified later. The inertia parameter depends on the knowledge of L𝐿Litalic_L and μ𝜇\muitalic_μ, but this new scheme guarantees exponential decay even if κ=μL𝜅𝜇𝐿\kappa=\frac{\mu}{L}italic_κ = divide start_ARG italic_μ end_ARG start_ARG italic_L end_ARG is overestimated. We prove that an overestimation of κ𝜅\kappaitalic_κ only results in suboptimal exponential decay. Theorem 1 provides a straightforward Lyapunov analysis and a fast convergence rate for a given friction parameter, while Theorem 2 yields rates that can be achieved even if the friction parameter is not set in an optimal way, demonstrating that fast exponential decay is robust to a mild overestimation of the quadratic growth parameter μ𝜇\muitalic_μ.

The paper is organized as follows: in Section 2 we introduce the main geometric assumption made on the function to minimize, namely the quadratic growth condition, and propose a review of the literature on the convergence rates of inertial algorithms under this condition. In Section 3 is devoted to our two main theorems and corollary proving that Heavy Ball like methods can be properly parameterized to achieve fast exponential decay for this class of function. Section 4 presents the continuous counterpart of the discrete analysis proposed in Section 3 providing a guide to construct the proofs of the Theorems presented in Section 3, as well as new results for the convergence rate for the trajectories of the Heavy Ball dynamical system. All the proofs have been gathered in Section 5, and the more technical ones are detailed in the Appendix.

2 Geometry of convex functions and inertial algorithms: definitions and state of the art.

Let us first recall some basic notations and definitions. We assume that Nsuperscript𝑁\mathbb{R}^{N}blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT is equipped with the Euclidean scalar product ,\langle\cdot,\cdot\rangle⟨ ⋅ , ⋅ ⟩ and the associated norm \|\cdot\|∥ ⋅ ∥. As usual B(x*,r)𝐵superscript𝑥𝑟B(x^{*},r)italic_B ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_r ) denotes the open Euclidean ball with center x*Nsuperscript𝑥superscript𝑁x^{*}\in\mathbb{R}^{N}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT and radius r>0𝑟0r>0italic_r > 0. For any real subset XN𝑋superscript𝑁X\subset\mathbb{R}^{N}italic_X ⊂ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, the Euclidean distance d𝑑ditalic_d is defined as:

xN,d(x,X)=infyXxy.formulae-sequencefor-all𝑥superscript𝑁𝑑𝑥𝑋subscriptinfimum𝑦𝑋norm𝑥𝑦\forall x\in\mathbb{R}^{N},\leavevmode\nobreak\ d(x,X)=\inf_{y\in X}\|x-y\|.∀ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT , italic_d ( italic_x , italic_X ) = roman_inf start_POSTSUBSCRIPT italic_y ∈ italic_X end_POSTSUBSCRIPT ∥ italic_x - italic_y ∥ .

2.1 Framework and notations

In this paper we focus on the class of composite functions: F=f+h𝐹𝑓F=f+hitalic_F = italic_f + italic_h where f𝑓fitalic_f is a convex, differentiable function having a L𝐿Litalic_L-Lipschitz gradient and hhitalic_h is a proper lower semicontinuous (l.s.c.) convex function whose proximal operator is known. The proximal operator of hhitalic_h is denoted by proxhsubscriptprox\text{prox}_{h}prox start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and defined by:

proxh(x)=argminyN(h(y)+12yx2).subscriptprox𝑥subscriptargmin𝑦superscript𝑁𝑦12superscriptnorm𝑦𝑥2\text{prox}_{h}(x)=\textup{argmin}\,_{y\in\mathbb{R}^{N}}{\left(h(y)+\frac{1}{% 2}\|y-x\|^{2}\right)}.prox start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ( italic_x ) = argmin start_POSTSUBSCRIPT italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_h ( italic_y ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_y - italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) . (4)

For this class of functions a classical minimization algorithm is the Forward-Backward algorithm (FB) whose iterations are described by:

x0N,xn+1=proxsh(xnsf(xn)),s(0,2L).formulae-sequencesubscript𝑥0superscript𝑁formulae-sequencesubscript𝑥𝑛1subscriptprox𝑠subscript𝑥𝑛𝑠𝑓subscript𝑥𝑛𝑠02𝐿x_{0}\in\mathbb{R}^{N},\quad x_{n+1}=\text{prox}_{sh}(x_{n}-s\nabla f(x_{n})),% \leavevmode\nobreak\ s\in\left(0,\frac{2}{L}\right).italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = prox start_POSTSUBSCRIPT italic_s italic_h end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_s ∇ italic_f ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) , italic_s ∈ ( 0 , divide start_ARG 2 end_ARG start_ARG italic_L end_ARG ) . (5)

Without further assumptions on F𝐹Fitalic_F, the convergence decay of the FB algorithm, i.e. the decay of F(xn)F*𝐹subscript𝑥𝑛superscript𝐹F(x_{n})-F^{*}italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT along the iterates, may be slow. In this paper we are interested in inertial methods, which are among the most effective first order optimization methods, and may ensure a better convergence rates, especially when F𝐹Fitalic_F is additionally strongly convex:

Definition 1 (Strong convexity 𝒮μsubscript𝒮𝜇\mathcal{S}_{\mu}caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT).

Let F:N{+}normal-:𝐹normal-→superscript𝑁F:\mathbb{R}^{N}\rightarrow\mathbb{R}\cup\{+\infty\}italic_F : blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT → blackboard_R ∪ { + ∞ } be a proper lower semicontinuous convex function. The function F𝐹Fitalic_F is said μ𝜇\muitalic_μ-strongly convex for some real constant μ>0𝜇0\mu>0italic_μ > 0 if the function xF(x)μ2x2maps-to𝑥𝐹𝑥𝜇2superscriptnorm𝑥2x\mapsto F(x)-\frac{\mu}{2}\|x\|^{2}italic_x ↦ italic_F ( italic_x ) - divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is convex.

Weakening this assumption, we consider the class of convex composite functions satisfying some quadratic growth condition:

Definition 2 (Quadratic growth condition 𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT).

Let F:N{+}normal-:𝐹normal-→superscript𝑁F:\mathbb{R}^{N}\rightarrow\mathbb{R}\cup\{+\infty\}italic_F : blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT → blackboard_R ∪ { + ∞ } be a proper lower semicontinuous convex function such that: X*=argminFsuperscript𝑋argmin𝐹X^{*}=\textup{argmin}\,F\neq\emptysetitalic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = argmin italic_F ≠ ∅ and F*=minFsuperscript𝐹𝐹F^{*}=\min Fitalic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = roman_min italic_F. The function F𝐹Fitalic_F satisfies a quadratic growth condition 𝒢μ2superscriptsubscript𝒢𝜇2\mathcal{G}_{\mu}^{2}caligraphic_G start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for some real constant μ>0𝜇0\mu>0italic_μ > 0 if:

xN,F(x)F*μ2d(x,X*)2.formulae-sequencefor-all𝑥superscript𝑁𝐹𝑥superscript𝐹𝜇2𝑑superscript𝑥superscript𝑋2\forall x\in\mathbb{R}^{N},\leavevmode\nobreak\ F(x)-F^{*}\geqslant\frac{\mu}{% 2}d(x,X^{*})^{2}.∀ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT , italic_F ( italic_x ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⩾ divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG italic_d ( italic_x , italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (6)

Classically the quadratic growth condition 𝒢μ2superscriptsubscript𝒢𝜇2\mathcal{G}_{\mu}^{2}caligraphic_G start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT can be seen as a relaxation of the strong convexity. Note that satisfying some growth condition does not impose the uniqueness of the minimizer as it does for strong convexity. In the convex setting, the quadratic growth condition 𝒢μ2superscriptsubscript𝒢𝜇2\mathcal{G}_{\mu}^{2}caligraphic_G start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is equivalent to a global Łojasiewicz property with an exponent 1212\frac{1}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG [20]. The Łojasiewicz property [24, 25] is a key tool in the mathematical analysis of continuous and discrete dynamical systems, initially introduced to prove the convergence of the trajectories for the gradient flow of analytic functions. An extension to nonsmooth functions has been proposed by Bolte et al. in [13]:

Definition 3.

Let F:N{+}normal-:𝐹normal-→superscript𝑁F:\mathbb{R}^{N}\rightarrow\mathbb{R}\cup\{+\infty\}italic_F : blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT → blackboard_R ∪ { + ∞ } be a lower proper semicontinuous convex function with X*=argminFsuperscript𝑋arg𝐹X^{*}=\textup{arg}\,\min\leavevmode\nobreak\ F\neq\emptysetitalic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = arg roman_min italic_F ≠ ∅. The function F𝐹Fitalic_F has a Łojasiewicz property of exponent θ[0,1)𝜃01\theta\in[0,1)italic_θ ∈ [ 0 , 1 ) if for any minimizer x*X*superscript𝑥superscript𝑋x^{*}\in X^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, there exist real constants c>0subscript𝑐normal-ℓ0c_{\ell}>0italic_c start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT > 0 and ε>0𝜀0\varepsilon>0italic_ε > 0 such that:

xB(x*,ε),(F(x)F(x*))θcd(0,F(x)).formulae-sequencefor-all𝑥𝐵superscript𝑥𝜀superscript𝐹𝑥𝐹superscript𝑥𝜃subscript𝑐𝑑0𝐹𝑥\forall x\in B(x^{*},\varepsilon),\leavevmode\nobreak\ \left(F(x)-F(x^{*})% \right)^{\theta}\leqslant c_{\ell}d(0,\partial F(x)).∀ italic_x ∈ italic_B ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_ε ) , ( italic_F ( italic_x ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ⩽ italic_c start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_d ( 0 , ∂ italic_F ( italic_x ) ) . (7)

The Łojasiewicz property is said to be global if (7) is satisfied for any xN𝑥superscript𝑁x\in\mathbb{R}^{N}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT.

A general inertial optimization method can be described as follows:

n,{yn=xn+αn(xnxn1),xn+1=proxsh(ynsf(zn)),\forall n\in\mathbb{N},\leavevmode\nobreak\ \left\{\begin{gathered}y_{n}=x_{n}% +\alpha_{n}(x_{n}-x_{n-1}),\\ x_{n+1}=\text{prox}_{sh}\left(y_{n}-s\nabla f(z_{n})\right),\end{gathered}\right.∀ italic_n ∈ blackboard_N , { start_ROW start_CELL italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = prox start_POSTSUBSCRIPT italic_s italic_h end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_s ∇ italic_f ( italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) , end_CELL end_ROW (8)

where αn>0subscript𝛼𝑛0\alpha_{n}>0italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT > 0 denotes some friction parameter and zn=xnsubscript𝑧𝑛subscript𝑥𝑛z_{n}=x_{n}italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT or ynsubscript𝑦𝑛y_{n}italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT depending on the considered method. Historically in his seminal work [32], B.T. Polyak proposes a first inertial scheme by choosing a constant friction parameter αn=αsubscript𝛼𝑛𝛼\alpha_{n}=\alphaitalic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_α and zn=xnsubscript𝑧𝑛subscript𝑥𝑛z_{n}=x_{n}italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, for the minimization of C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT strongly convex functions. One of the most popular inertial algorithm is FISTA (for Fast Iterative Shrinkage-Thresholding Algorithm) introduced by Beck and Teboulle in [11] to minimize convex composite functions. Inspired by Nesterov’s accelerated gradient method proposed in [27], the friction parameter αnsubscript𝛼𝑛\alpha_{n}italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is defined by:

αn=tn11tn,zn=ynformulae-sequencesubscript𝛼𝑛subscript𝑡𝑛11subscript𝑡𝑛subscript𝑧𝑛subscript𝑦𝑛\alpha_{n}=\frac{t_{n-1}-1}{t_{n}},\qquad z_{n}=y_{n}italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG italic_t start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - 1 end_ARG start_ARG italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG , italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (9)

where the sequence (tn)nsubscriptsubscript𝑡𝑛𝑛(t_{n})_{n\in\mathbb{N}}( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT is recursively defined by: t0=1subscript𝑡01t_{0}=1italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 and tn+1=1+1+4tn22subscript𝑡𝑛1114superscriptsubscript𝑡𝑛22t_{n+1}=\frac{1+\sqrt{1+4t_{n}^{2}}}{2}italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = divide start_ARG 1 + square-root start_ARG 1 + 4 italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG 2 end_ARG. Chambolle and Dossal propose in [16] a variant of FISTA defining αn=n1n+α1subscript𝛼𝑛𝑛1𝑛𝛼1\alpha_{n}=\frac{n-1}{n+\alpha-1}italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG italic_n - 1 end_ARG start_ARG italic_n + italic_α - 1 end_ARG for any n*𝑛superscriptn\in\mathbb{N}^{*}italic_n ∈ blackboard_N start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT where α3𝛼3\alpha\geqslant 3italic_α ⩾ 3. The original choice of Nesterov can be approximated by setting α=3𝛼3\alpha=3italic_α = 3.

In this paper we consider the family of Heavy Ball algorithms for which the friction parameter αnsubscript𝛼𝑛\alpha_{n}italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is set to a constant α>0𝛼0\alpha>0italic_α > 0 and zn=ynsubscript𝑧𝑛subscript𝑦𝑛z_{n}=y_{n}italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. The term Heavy Ball refers to a family of optimization schemes that can be interpreted as discretizations of the following second-order ordinary differential equation:

x¨(t)+αx˙(t)+F(x(t))=0,¨𝑥𝑡𝛼˙𝑥𝑡𝐹𝑥𝑡0\ddot{x}(t)+\alpha\dot{x}(t)+\nabla F(x(t))=0,over¨ start_ARG italic_x end_ARG ( italic_t ) + italic_α over˙ start_ARG italic_x end_ARG ( italic_t ) + ∇ italic_F ( italic_x ( italic_t ) ) = 0 , (10)

which describes the move of a heavy ball in a potential field with a constant friction. The inertia coefficient α𝛼\alphaitalic_α has to be parameterized according to the geometry of F𝐹Fitalic_F to get an optimal convergence rate. For the class of μ𝜇\muitalic_μ-strongly convex functions, Beck in [10, Chapter 10.7.7] proposes the following choice (following Nesterov’s choice [28]):

α=1κ1+κ,zn=ynformulae-sequence𝛼1𝜅1𝜅subscript𝑧𝑛subscript𝑦𝑛\alpha=\frac{1-\sqrt{\kappa}}{1+\sqrt{\kappa}},\qquad z_{n}=y_{n}italic_α = divide start_ARG 1 - square-root start_ARG italic_κ end_ARG end_ARG start_ARG 1 + square-root start_ARG italic_κ end_ARG end_ARG , italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (11)

where κ=μL𝜅𝜇𝐿\kappa=\frac{\mu}{L}italic_κ = divide start_ARG italic_μ end_ARG start_ARG italic_L end_ARG, leading to the algorithm V-FISTA (seen as a variant of FISTA by the author).

2.2 Convergence rate of inertial algorithms under quadratic growth assumptions

Let F=f+h𝐹𝑓F=f+hitalic_F = italic_f + italic_h be a convex composite function where f𝑓fitalic_f is a convex, differentiable function having a L𝐿Litalic_L-Lipschitz gradient and hhitalic_h is a proper semicontinuous convex function whose proximal operator is known.

When the function F𝐹Fitalic_F to minimize satisfies some additional growth assumption 𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT, Garrigos et al. [20] prove that the Forward-Backward algorithm (5) provides better theoretical guarantees than in the general convex case. More precisely, they show that the function values achieve an exponential decay F(xn)F*=𝒪(eμLn)𝐹subscript𝑥𝑛superscript𝐹𝒪superscript𝑒𝜇𝐿𝑛F(x_{n})-F^{*}=\mathcal{O}\left(e^{-\frac{\mu}{L}n}\right)italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_μ end_ARG start_ARG italic_L end_ARG italic_n end_POSTSUPERSCRIPT ) along the iterates of Forward-Backward, without any assumption on the set of minimizers X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT. Observe that this convergence rate depends on the ratio κ=μL𝜅𝜇𝐿\kappa=\frac{\mu}{L}italic_κ = divide start_ARG italic_μ end_ARG start_ARG italic_L end_ARG which represents the inverse of the conditioning of F𝐹Fitalic_F and can be very small in large-scale optimization. Note also that Necoara et al. proved similar results for the projected gradient algorithm in [26].

While Nesterov’s accelerations allow for speeding up gradient-based algorithms for the class of convex functions, it is less clear for the class of convex functions satisfying some quadratic growth condition. Considering FISTA, in its historical form by Beck and Teboulle (9) or its variant introduced by Chambolle and Dossal in [16], the convergence rate is still polynomial for the class of convex functions satisfying some quadratic growth condition. Note however that considering the variant of FISTA introduced by [16], Aujol et al. prove in [9] that the sequence (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT provided by (8) with αn=n1n+α1subscript𝛼𝑛𝑛1𝑛𝛼1\alpha_{n}=\frac{n-1}{n+\alpha-1}italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG italic_n - 1 end_ARG start_ARG italic_n + italic_α - 1 end_ARG and α3𝛼3\alpha\geqslant 3italic_α ⩾ 3 satisfies:

F(xn)F*=𝒪(n2α3).𝐹subscript𝑥𝑛superscript𝐹𝒪superscript𝑛2𝛼3F(x_{n})-F^{*}=\mathcal{O}\left(n^{-\frac{2\alpha}{3}}\right).italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_O ( italic_n start_POSTSUPERSCRIPT - divide start_ARG 2 italic_α end_ARG start_ARG 3 end_ARG end_POSTSUPERSCRIPT ) . (12)

which is better than the rate in the convex setting 𝒪(n2)𝒪superscript𝑛2\mathcal{O}\left(n^{-2}\right)caligraphic_O ( italic_n start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) from [27, 11] which is in fact o(n2)𝑜superscript𝑛2o\left(n^{-2}\right)italic_o ( italic_n start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) as proved by Attouch and Peypouquet [4]. Although this decay is not exponential, the authors show that the friction parameter α𝛼\alphaitalic_α can be set according to some desired accuracy, and in that case the number of iterations required to achieve this accuracy is comparable to methods ensuring a fast exponential decay of the error, i.e. a exponential decay depending on μL𝜇𝐿\sqrt{\frac{\mu}{L}}square-root start_ARG divide start_ARG italic_μ end_ARG start_ARG italic_L end_ARG end_ARG, which can be much faster than an exponential decay rate solely in κ=μL𝜅𝜇𝐿\kappa=\frac{\mu}{L}italic_κ = divide start_ARG italic_μ end_ARG start_ARG italic_L end_ARG as for Forward-Backward. Note that this result holds under the assumption that F𝐹Fitalic_F has a unique minimizer.

A way to accelerate the convergence of FISTA for the class of composite convex functions having some quadratic growth property, is to use restart strategies. The idea of this approach is to take benefit of inertia while avoiding oscillations by re-initializing the inertia parameter to zero when some restart condition is verified. Empiric restart rules have been proposed by Giselsson and Boyd [22] or O’Donoghue and Candès [30], offering an improved convergence of FISTA in practice but without theoretical guarantees. Elementary computations show that re-initializing the inertia parameter every 2eLμ2𝑒𝐿𝜇\lfloor 2e\sqrt{\frac{L}{\mu}}\rfloor⌊ 2 italic_e square-root start_ARG divide start_ARG italic_L end_ARG start_ARG italic_μ end_ARG end_ARG ⌋ iterations allows the resulting sequence to satisfy:

F(xn)F*=𝒪(e1eκn).𝐹subscript𝑥𝑛superscript𝐹𝒪superscript𝑒1𝑒𝜅𝑛F(x_{n})-F^{*}=\mathcal{O}\left(e^{-\frac{1}{e}\sqrt{\kappa}n}\right).italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_e end_ARG square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT ) . (13)

This convergence rate is actually the fastest in the literature for restart methods and does not require the uniqueness of the minimizer. But note that it requires knowledge of the value of μ𝜇\muitalic_μ, see e.g. [29, 30, 26]. Recently, adaptive restart schemes have been developed aiming at exploiting the geometry assumption 𝒢μ2superscriptsubscript𝒢𝜇2\mathcal{G}_{\mu}^{2}caligraphic_G start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to derive improved convergence rates without knowing exactly the growth parameter μ𝜇\muitalic_μ: Fercoq and Qu [19], Alamo et al. [1], Aujol et al. [6, 5] introduce restart schemes ensuring a fast exponential decay of the error (i.e. depending on κ)\sqrt{\kappa})square-root start_ARG italic_κ end_ARG ). The schemes having the best theoretical guarantees in this setting are that proposed by Alamo et al. in [1] (F(xn)F*=𝒪(eln(15)4eκn)𝐹subscript𝑥𝑛superscript𝐹𝒪superscript𝑒154𝑒𝜅𝑛F(x_{n})-F^{*}=\mathcal{O}\left(e^{-\frac{\ln(15)}{4e}\sqrt{\kappa}n}\right)italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - divide start_ARG roman_ln ( 15 ) end_ARG start_ARG 4 italic_e end_ARG square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT )) and the method introduced by Renegar and Grimmer in [33] (F(xn)F*=𝒪(e122κn)𝐹subscript𝑥𝑛superscript𝐹𝒪superscript𝑒122𝜅𝑛F(x_{n})-F^{*}=\mathcal{O}\left(e^{-\frac{1}{2\sqrt{2}}\sqrt{\kappa}n}\right)italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 square-root start_ARG 2 end_ARG end_ARG square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT )). As the optimal periodic restart, no uniqueness assumption is needed on the set of minimizers of F𝐹Fitalic_F to obtain these guarantees.

In contrast to FISTA and Nesterov’s accelerated gradient method, Heavy Ball type schemes are designed for convex functions satisfying additional growth assumptions such as the μ𝜇\muitalic_μ-strong convexity. To this end, these methods require to be calibrated according to the growth parameter μ𝜇\muitalic_μ. In his seminal paper [32], Polyak introduces the first Heavy Ball method for C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT μ𝜇\muitalic_μ-strongly convex functions which guarantees a convergence rate of the error of 𝒪(e4κn)𝒪superscript𝑒4𝜅𝑛\mathcal{O}\left(e^{-4\sqrt{\kappa}n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT - 4 square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT ). This decay rate is significantly fast but relies strongly on the C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT assumption. Ghadimi et al. in [21] provide a C1superscript𝐶1C^{1}italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT convex function such that this method does not converge. Nesterov proposes in [28] the accelerated gradient method for strongly convex functions which only requires a C1superscript𝐶1C^{1}italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT assumption ensuring that the error decreases as 𝒪(eκn)𝒪superscript𝑒𝜅𝑛\mathcal{O}\left(e^{-\sqrt{\kappa}n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT - square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT ). In this setting, several Heavy Ball schemes have been proposed such as Siegel’s Heavy Ball method [35] and the geometric descent method [15] which have the same theoretical asymptotic guarantees as Nesterov’s accelerated gradient method for strongly convex functions, the Heavy Ball method by Aujol et al. [7] for strongly convex functions (which we will denote ADR-𝒮μsubscript𝒮𝜇\mathcal{S}_{\mu}caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT Heavy Ball), the triple momentum method by Van Scoy et al. [37] and ITEM by Taylor and Drori [36]. The latter two schemes are built thanks to the Performance Estimation Problem approach introduced by Drori and Teboulle [18] and they provide the best bounds on the error for this class of function and first-order methods (𝒪(e2κn)𝒪superscript𝑒2𝜅𝑛\mathcal{O}\left(e^{-2\sqrt{\kappa}n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT - 2 square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT )). Some of these schemes can be adapted to composite optimization as detailed in Table 1. Note that Beck generalizes Nesterov’s accelerated gradient method for strongly convex functions to composite optimization in [10] with V-FISTA proving the same theoretical convergence rate of the error.

Algorithm Reference Assumption on F𝐹Fitalic_F Convergence rate of F(xn)F*𝐹subscript𝑥𝑛superscript𝐹F(x_{n})-F^{*}italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT

Polyak’s Heavy Ball

Polyak [32]

𝒮μsubscript𝒮𝜇\mathcal{S}_{\mu}caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

𝒪(e4κn)𝒪superscript𝑒4𝜅𝑛\mathcal{O}\left(e^{-4\sqrt{\kappa}n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT - 4 square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT )

Nesterov’s accelerated gradient method for strongly convex functions

Nesterov [28]

Necoara et al. [26]

𝒮μsubscript𝒮𝜇\mathcal{S}_{\mu}caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and C1superscript𝐶1C^{1}italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT

𝒬μsubscript𝒬𝜇\mathcal{Q}_{\mu}caligraphic_Q start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT, C1superscript𝐶1C^{1}italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and uniqueness of the projection of the iterates onto X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT

𝒪(eκn)𝒪superscript𝑒𝜅𝑛\mathcal{O}\left(e^{-\sqrt{\kappa}n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT - square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT )

Geometric descent method

Bubeck et al. [15]

Chen et al. [17]

𝒮μsubscript𝒮𝜇\mathcal{S}_{\mu}caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT

Adapted to composite optimization

𝒪(eκn)𝒪superscript𝑒𝜅𝑛\mathcal{O}\left(e^{-\sqrt{\kappa}n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT - square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT )

Triple momentum method

Van Scoy et al. [37]

𝒮μsubscript𝒮𝜇\mathcal{S}_{\mu}caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and C1superscript𝐶1C^{1}italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT

𝒪(e2κn)𝒪superscript𝑒2𝜅𝑛\mathcal{O}\left(e^{-2\sqrt{\kappa}n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT - 2 square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT )

ITEM

Taylor, Drori [36]

𝒮μsubscript𝒮𝜇\mathcal{S}_{\mu}caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and C1superscript𝐶1C^{1}italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT

𝒪(e2κn)𝒪superscript𝑒2𝜅𝑛\mathcal{O}\left(e^{-2\sqrt{\kappa}n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT - 2 square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT )

Polyak’s Heavy Ball with general friction

Ghadimi et al. [21]

𝒮μsubscript𝒮𝜇\mathcal{S}_{\mu}caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and C1superscript𝐶1C^{1}italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT

𝒪(eκn)𝒪superscript𝑒𝜅𝑛\mathcal{O}\left(e^{-\kappa n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT - italic_κ italic_n end_POSTSUPERSCRIPT )

Siegel’s Heavy Ball

Siegel [35]

𝒮μsubscript𝒮𝜇\mathcal{S}_{\mu}caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and C1superscript𝐶1C^{1}italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT

Adapted to composite optimization

𝒪(eκn)𝒪superscript𝑒𝜅𝑛\mathcal{O}\left(e^{-\sqrt{\kappa}n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT - square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT )

V-FISTA

Beck [10]

𝒮μsubscript𝒮𝜇\mathcal{S}_{\mu}caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT

Adapted to composite optimization

𝒪(eκn)𝒪superscript𝑒𝜅𝑛\mathcal{O}\left(e^{-\sqrt{\kappa}n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT - square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT )

ADR-𝒮μsubscript𝒮𝜇\mathcal{S}_{\mu}caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT Heavy Ball

Aujol et al. [7]

𝒮μsubscript𝒮𝜇\mathcal{S}_{\mu}caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT

Adapted to composite optimization

𝒪(e(2κ+𝒪(κ))n)𝒪superscript𝑒2𝜅𝒪𝜅𝑛\mathcal{O}\left(e^{\left(-\sqrt{2\kappa}+\mathcal{O}(\kappa)\right)n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT ( - square-root start_ARG 2 italic_κ end_ARG + caligraphic_O ( italic_κ ) ) italic_n end_POSTSUPERSCRIPT )

ADR-𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT Heavy Ball

Aujol et al. [8]

𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and uniqueness of the minimizer

Adapted to composite optimization

𝒪(e((22)κ+𝒪(κ))n)𝒪superscript𝑒22𝜅𝒪𝜅𝑛\mathcal{O}\left(e^{(-(2-\sqrt{2})\sqrt{\kappa}+\mathcal{O}(\kappa))n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT ( - ( 2 - square-root start_ARG 2 end_ARG ) square-root start_ARG italic_κ end_ARG + caligraphic_O ( italic_κ ) ) italic_n end_POSTSUPERSCRIPT )

ADR-𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT Heavy Ball

Aujol et al. [8]

𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT

Adapted to composite optimization

𝒪(e(κ+ε+𝒪(κ))n)𝒪superscript𝑒𝜅𝜀𝒪𝜅𝑛\mathcal{O}\left(e^{(-\kappa+\varepsilon+\mathcal{O}(\kappa))n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT ( - italic_κ + italic_ε + caligraphic_O ( italic_κ ) ) italic_n end_POSTSUPERSCRIPT )

Table 1: Convergence rate of F(xn)F*𝐹subscript𝑥𝑛superscript𝐹F(x_{n})-F^{*}italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT for Heavy Ball type schemes with various geometry assumptions on F𝐹Fitalic_F.

Recently, Heavy Ball type schemes have been studied under weaker geometry assumptions than strong convexity. Necoara et al. prove in [26] that the convergence rate of Nesterov’s accelerated gradient method for strongly convex method is actually valid for C1superscript𝐶1C^{1}italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT μ𝜇\muitalic_μ-quasi-strongly convex functions i.e. for functions satisfying:

xn,F(x),xx*F(x)F(x*)+μ2xx*2,formulae-sequencefor-all𝑥superscript𝑛𝐹𝑥𝑥superscript𝑥𝐹𝑥𝐹superscript𝑥𝜇2superscriptnorm𝑥superscript𝑥2\forall x\in\mathbb{R}^{n},\langle\nabla F(x),x-x^{*}\rangle\geqslant F(x)-F(x% ^{*})+\frac{\mu}{2}\|x-x^{*}\|^{2},\color[rgb]{0,0,0}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill% {0}∀ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , ⟨ ∇ italic_F ( italic_x ) , italic_x - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ ⩾ italic_F ( italic_x ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ italic_x - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (14)

where x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT denotes the projection onto X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, provided that the iterates share the same projection onto the set of minimizers. In [8], Aujol et al. build a Heavy Ball type scheme (ADR-𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT Heavy Ball) for functions satisfying the quadratic growth assumption 𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT guaranteeing that

F(xn)F*=𝒪(e((22)κ+𝒪(κ))n),𝐹subscript𝑥𝑛superscript𝐹𝒪superscript𝑒22𝜅𝒪𝜅𝑛F(x_{n})-F^{*}=\mathcal{O}\left(e^{(-(2-\sqrt{2})\sqrt{\kappa}+\mathcal{O}(% \kappa))n}\right),italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_O ( italic_e start_POSTSUPERSCRIPT ( - ( 2 - square-root start_ARG 2 end_ARG ) square-root start_ARG italic_κ end_ARG + caligraphic_O ( italic_κ ) ) italic_n end_POSTSUPERSCRIPT ) , (15)

as long as F𝐹Fitalic_F has a unique minimizer.

Thus, the theoretical guarantees of Heavy Ball type schemes are the best in the literature among first-order methods for functions satisfying growth conditions but they do not hold without assuming the uniqueness of the minimizer. If this hypothesis is not verified, the theoretical convergence rates are similar to those of Forward-Backward, and the relevance of applying such algorithms in this context can therefore be questioned.

3 Fast exponential decay for Heavy Ball type algorithms

Let us now consider Heavy Ball type methods that can be generically described as variants of the V-FISTA algorithm proposed by Beck in [10]:

n,{xn+1=proxsh(ynsf(yn)),yn+1=xn+1+α(xn+1xn),\forall n\in\mathbb{N},\leavevmode\nobreak\ \left\{\begin{gathered}x_{n+1}=% \text{prox}_{sh}\left(y_{n}-s\nabla f(y_{n})\right),\\ y_{n+1}=x_{n+1}+\alpha(x_{n+1}-x_{n}),\end{gathered}\right.∀ italic_n ∈ blackboard_N , { start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = prox start_POSTSUBSCRIPT italic_s italic_h end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_s ∇ italic_f ( italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) , end_CELL end_ROW start_ROW start_CELL italic_y start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT + italic_α ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , end_CELL end_ROW (V-FISTA)

with x0Nsubscript𝑥0superscript𝑁x_{0}\in\mathbb{R}^{N}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, s=1L𝑠1𝐿s=\frac{1}{L}italic_s = divide start_ARG 1 end_ARG start_ARG italic_L end_ARG, y0=x0subscript𝑦0subscript𝑥0y_{0}=x_{0}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and any α>0𝛼0\alpha>0italic_α > 0. Recall that in the original definition of V-FISTA [10], the dam** parameter α𝛼\alphaitalic_α is set to: 1κ1+κ1𝜅1𝜅\frac{1-\sqrt{\kappa}}{1+\sqrt{\kappa}}divide start_ARG 1 - square-root start_ARG italic_κ end_ARG end_ARG start_ARG 1 + square-root start_ARG italic_κ end_ARG end_ARG where κ=μL𝜅𝜇𝐿\kappa=\frac{\mu}{L}italic_κ = divide start_ARG italic_μ end_ARG start_ARG italic_L end_ARG denotes the inverse of the conditioning of the function F𝐹Fitalic_F to minimize.

The main contribution in this section is to prove that Heavy Ball methods like (V-FISTA) can be properly parameterized to achieve fast exponential decay rates (i.e. depending on κ𝜅\sqrt{\kappa}square-root start_ARG italic_κ end_ARG) for the class of convex composite functions satisfying some quadratic growth property 𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and without assuming the uniqueness of the minimizer. An example of such functions is the LASSO functional :

F(x)=12Axy22+λx1.𝐹𝑥12superscriptsubscriptnorm𝐴𝑥𝑦22𝜆subscriptnorm𝑥1F(x)=\frac{1}{2}\left\|{Ax-y}\right\|_{2}^{2}+\lambda\left\|{x}\right\|_{1}.italic_F ( italic_x ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_A italic_x - italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ ∥ italic_x ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . (16)

This function F𝐹Fitalic_F is convex, satisfies a quadratic growth condition but may not have a unique minimizer. To the best of our knowledge, this is the first result proving that an inertial method can actually improve the convergence rate of the Forward-Backward algorithm which is in O(eκn)𝑂superscript𝑒𝜅𝑛O(e^{-\kappa n})italic_O ( italic_e start_POSTSUPERSCRIPT - italic_κ italic_n end_POSTSUPERSCRIPT ) and not in O(ecκn)𝑂superscript𝑒𝑐𝜅𝑛O(e^{-c\sqrt{\kappa}n})italic_O ( italic_e start_POSTSUPERSCRIPT - italic_c square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT ). In large scale dimension, the inverse κ𝜅\kappaitalic_κ of the conditioning of the function to minimize could be very small, so that decaying in κ𝜅\kappaitalic_κ could be much slower that in κ𝜅\sqrt{\kappa}square-root start_ARG italic_κ end_ARG.

Our purpose is to build an inertial algorithm providing fast exponential decay, that is ensuring that F(xn)F(x*)=O(ecκn)𝐹subscript𝑥𝑛𝐹superscript𝑥𝑂superscript𝑒𝑐𝜅𝑛F(x_{n})-F(x^{*})=O(e^{-c\sqrt{\kappa}n})italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) = italic_O ( italic_e start_POSTSUPERSCRIPT - italic_c square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT ) but not to optimize the value of c𝑐citalic_c. Both Theorems 1 and 2 provide such bounds. In Theorem 1, the value of the inertial parameter α𝛼\alphaitalic_α is chosen to provide a fast decay rate under a mild condition of κ𝜅\kappaitalic_κ and by using a quite simple Lyapunov analysis. Indeed, other choices could have been made leading to slightly different inertia, rates, conditions on κ𝜅\kappaitalic_κ and Lyapunov functions, using the same approach.

The rates given by Theorem 2 are different from Theorem 1 since the objective is not the same. This second theorem provides bounds for a large set of inertial parameters and the inertia is not optimized for a fixed bound on κ𝜅\kappaitalic_κ. Indeed, in various settings, the exact value of κ𝜅\kappaitalic_κ is not precisely known. For the LASSO functional (16) for example, the value of the quadratic growth parameter μ𝜇\muitalic_μ may be hard to estimate. The goal of this second theorem is to provide fast exponential decays even if the inertia parameter is not set to its optimal value, and to provide rates that are more accurate when κ𝜅\kappaitalic_κ is smaller. The proof of this second theorem is inspired by the proof of Theorem 3 which provides results on the solution of the Heavy Ball dynamical system.

Theorem 1.

Let F=f+h𝐹𝑓F=f+hitalic_F = italic_f + italic_h where f𝑓fitalic_f is a convex differentiable function having a L𝐿Litalic_L-Lipschitz gradient for some L>0𝐿0L>0italic_L > 0, and hhitalic_h a proper convex l.s.c. function. Assume that F𝐹Fitalic_F satisfies a quadratic growth condition 𝒢μ2superscriptsubscript𝒢𝜇2\mathcal{G}_{\mu}^{2}caligraphic_G start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for some real parameter μ>0𝜇0\mu>0italic_μ > 0. Let (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be the sequence provided by (V-FISTA) with α=1533κ𝛼1533𝜅\alpha=1-\frac{5}{3\sqrt{3}}\sqrt{\kappa}italic_α = 1 - divide start_ARG 5 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG and s=1L𝑠1𝐿s=\frac{1}{L}italic_s = divide start_ARG 1 end_ARG start_ARG italic_L end_ARG. If κ13𝜅13\kappa\leqslant\frac{1}{3}italic_κ ⩽ divide start_ARG 1 end_ARG start_ARG 3 end_ARG, then:

n,F(xn)F*43(1233κ)n(F(x0)F*),formulae-sequencefor-all𝑛𝐹subscript𝑥𝑛superscript𝐹43superscript1233𝜅𝑛𝐹subscript𝑥0superscript𝐹\forall n\in\mathbb{N},\leavevmode\nobreak\ F(x_{n})-F^{*}\leqslant\frac{4}{3}% \left(1-\frac{2}{3\sqrt{3}}\sqrt{\kappa}\right)^{n}(F(x_{0})-F^{*}),∀ italic_n ∈ blackboard_N , italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⩽ divide start_ARG 4 end_ARG start_ARG 3 end_ARG ( 1 - divide start_ARG 2 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_F ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) , (17)

and

xnxn1=𝒪(eκ33n).normsubscript𝑥𝑛subscript𝑥𝑛1𝒪superscript𝑒𝜅33𝑛\|x_{n}-x_{n-1}\|=\mathcal{O}\left(e^{-\frac{\sqrt{\kappa}}{3\sqrt{3}}n}\right).∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∥ = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - divide start_ARG square-root start_ARG italic_κ end_ARG end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG italic_n end_POSTSUPERSCRIPT ) . (18)

Theorem 1, whose proof can be found in Section 5.1, ensures that for a well-chosen parameter α𝛼\alphaitalic_α which depends on κ𝜅\kappaitalic_κ, the decay of the error along the iterates of (V-FISTA) is at worst of order 𝒪(e233κn)𝒪superscript𝑒233𝜅𝑛\mathcal{O}\left(e^{-\frac{2}{3\sqrt{3}}\sqrt{\kappa}n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT - divide start_ARG 2 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT ). Looking back at the results in the literature, this convergence rate is slower than those of most other Heavy Ball schemes. However, remember that the required assumptions on F𝐹Fitalic_F in these works (summarized in Table 1) are stronger than those needed in Theorem 1. The only method proposed for functions satisfying 𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT, i.e. ADR-𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT Heavy Ball [8], ensures a fast exponential decay of the error:

F(xn)F*=𝒪(e((22)κ+𝒪(κ))n),𝐹subscript𝑥𝑛superscript𝐹𝒪superscript𝑒22𝜅𝒪𝜅𝑛F(x_{n})-F^{*}=\mathcal{O}\left(e^{(-(2-\sqrt{2})\sqrt{\kappa}+\mathcal{O}(% \kappa))n}\right),italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_O ( italic_e start_POSTSUPERSCRIPT ( - ( 2 - square-root start_ARG 2 end_ARG ) square-root start_ARG italic_κ end_ARG + caligraphic_O ( italic_κ ) ) italic_n end_POSTSUPERSCRIPT ) ,

if the function F𝐹Fitalic_F has a unique minimizer. This theoretical decay is faster than (17), but it does not hold if F𝐹Fitalic_F has multiple minimizers. To the authors’ knowledge, the fast exponential decay of the error given by Theorem 1 is the first in the literature for Heavy Ball methods in this setting and without any uniqueness assumption on the set of minimizers.

In addition, the guarantee on the decay of the error given in (17) is faster than that given by FISTA restarted periodically and optimally as it only ensures (even with some oracle [26])

F(xn)F*=𝒪(e1eκn).𝐹subscript𝑥𝑛superscript𝐹𝒪superscript𝑒1𝑒𝜅𝑛F(x_{n})-F^{*}=\mathcal{O}\left(e^{-\frac{1}{e}\sqrt{\kappa}n}\right).italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_e end_ARG square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT ) .

This means that in this setting, (V-FISTA) is theoretically more relevant than a periodic restart of FISTA when the growth parameter μ𝜇\muitalic_μ is known. In other words, one should define a constant inertia parameter depending on μ𝜇\muitalic_μ and L𝐿Litalic_L instead of setting an increasing inertia parameter and re-initializing it optimally.

The second claim of Theorem 1 gives an asymptotic control on xnxn1normsubscript𝑥𝑛subscript𝑥𝑛1\|x_{n}-x_{n-1}\|∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∥ which ensures that n*xnxn1<+subscript𝑛superscriptnormsubscript𝑥𝑛subscript𝑥𝑛1\sum_{n\in\mathbb{N}^{*}}\|x_{n}-x_{n-1}\|<+\infty∑ start_POSTSUBSCRIPT italic_n ∈ blackboard_N start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∥ < + ∞. As a consequence, the length of the trajectory of the sequence (xn)nsubscriptsubscript𝑥𝑛𝑛\left(x_{n}\right)_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT is finite and it converges strongly to a minimizer of the function F𝐹Fitalic_F.

Below is a second theorem about the iterates of (V-FISTA) which gives stronger and more general results than Theorem 1. The proof is built using the parallel with dynamical systems (see Section 4) and is located in Section 5.2.

Theorem 2.

Let F=f+h𝐹𝑓F=f+hitalic_F = italic_f + italic_h where f𝑓fitalic_f is a convex differentiable function having a L𝐿Litalic_L-Lipschitz gradient for some L>0𝐿0L>0italic_L > 0, and hhitalic_h a proper convex l.s.c. function. Assume that F𝐹Fitalic_F satisfies a quadratic growth condition 𝒢μ2superscriptsubscript𝒢𝜇2\mathcal{G}_{\mu}^{2}caligraphic_G start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for some real parameter μ>0𝜇0\mu>0italic_μ > 0. Let (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be the sequence provided by (V-FISTA) with α=1ωκ𝛼1𝜔𝜅\alpha=1-\omega\sqrt{\kappa}italic_α = 1 - italic_ω square-root start_ARG italic_κ end_ARG where ω(0,1κ)𝜔01𝜅\omega\in\left(0,\frac{1}{\sqrt{\kappa}}\right)italic_ω ∈ ( 0 , divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_κ end_ARG end_ARG ). Then, for any n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N:

F(xn)F*(1+(ωτ)2+(ωτ)ωτκ)(1τκ+τ2κ)n(F(x0)F*),𝐹subscript𝑥𝑛superscript𝐹1superscript𝜔𝜏2𝜔𝜏𝜔𝜏𝜅superscript1𝜏𝜅superscript𝜏2𝜅𝑛𝐹subscript𝑥0superscript𝐹F(x_{n})-F^{*}\leqslant\left(1+(\omega-\tau)^{2}+(\omega-\tau)\omega\tau\sqrt{% \kappa}\right)\left(1-\tau\sqrt{\kappa}+\tau^{2}\kappa\right)^{n}(F(x_{0})-F^{% *}),italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⩽ ( 1 + ( italic_ω - italic_τ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_ω - italic_τ ) italic_ω italic_τ square-root start_ARG italic_κ end_ARG ) ( 1 - italic_τ square-root start_ARG italic_κ end_ARG + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_F ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) , (19)

and

xnxn1=𝒪(e12τκ(1τκ)n),normsubscript𝑥𝑛subscript𝑥𝑛1𝒪superscript𝑒12𝜏𝜅1𝜏𝜅𝑛\|x_{n}-x_{n-1}\|=\mathcal{O}\left(e^{-\frac{1}{2}\tau\sqrt{\kappa}\left(1-% \tau\sqrt{\kappa}\right)n}\right),∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∥ = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_τ square-root start_ARG italic_κ end_ARG ( 1 - italic_τ square-root start_ARG italic_κ end_ARG ) italic_n end_POSTSUPERSCRIPT ) , (20)

where τ>0𝜏0\tau>0italic_τ > 0 satisfies the following inequality:

(1ωκ)τ3ω(2ωκ)τ2+(ω2+2)τω0.1𝜔𝜅superscript𝜏3𝜔2𝜔𝜅superscript𝜏2superscript𝜔22𝜏𝜔0\left(1-\omega\sqrt{\kappa}\right)\tau^{3}-\omega\left(2-\omega\sqrt{\kappa}% \right)\tau^{2}+(\omega^{2}+2)\tau-\omega\leqslant 0.( 1 - italic_ω square-root start_ARG italic_κ end_ARG ) italic_τ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - italic_ω ( 2 - italic_ω square-root start_ARG italic_κ end_ARG ) italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ) italic_τ - italic_ω ⩽ 0 . (21)

The statements of Theorem 2 are less readable than those of Theorem 1 but they are actually stronger. The inequality (21) hides the convergence rates which can be obtained for a given ω(0,1κ)𝜔01𝜅\omega\in\left(0,\frac{1}{\sqrt{\kappa}}\right)italic_ω ∈ ( 0 , divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_κ end_ARG end_ARG ). Observe that the larger τ𝜏\tauitalic_τ, the better the convergence rate. The best rates are obtained when:

(1ωκ)τ3ω(2ωκ)τ2+(ω2+2)τω=0.1𝜔𝜅superscript𝜏3𝜔2𝜔𝜅superscript𝜏2superscript𝜔22𝜏𝜔0\left(1-\omega\sqrt{\kappa}\right)\tau^{3}-\omega\left(2-\omega\sqrt{\kappa}% \right)\tau^{2}+(\omega^{2}+2)\tau-\omega=0.( 1 - italic_ω square-root start_ARG italic_κ end_ARG ) italic_τ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - italic_ω ( 2 - italic_ω square-root start_ARG italic_κ end_ARG ) italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ) italic_τ - italic_ω = 0 . (22)

The admissible maximum value of τ𝜏\tauitalic_τ can be thus obtained studying the limit case when κ=0𝜅0\kappa=0italic_κ = 0:

τ32ωτ2+(ω2+2)τω=0.superscript𝜏32𝜔superscript𝜏2superscript𝜔22𝜏𝜔0\tau^{3}-2\omega\tau^{2}+(\omega^{2}+2)\tau-\omega=0.italic_τ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - 2 italic_ω italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ) italic_τ - italic_ω = 0 . (23)

i.e. (since τ>0)\tau>0)italic_τ > 0 ):

ω21+2τ2τω+2+τ2=0.superscript𝜔212superscript𝜏2𝜏𝜔2superscript𝜏20\omega^{2}-\frac{1+2\tau^{2}}{\tau}\omega+2+\tau^{2}=0.italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 + 2 italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_τ end_ARG italic_ω + 2 + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0 . (24)

This can be seen as a quadratic polynomial in ω𝜔\omegaitalic_ω, whose discriminant is:

ΔΔ\displaystyle\Deltaroman_Δ =\displaystyle== 1τ2(1+4τ4+4τ28τ24τ4)1superscript𝜏214superscript𝜏44superscript𝜏28superscript𝜏24superscript𝜏4\displaystyle\frac{1}{\tau^{2}}\left(1+4\tau^{4}+4\tau^{2}-8\tau^{2}-4\tau^{4}\right)divide start_ARG 1 end_ARG start_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( 1 + 4 italic_τ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT + 4 italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 8 italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 4 italic_τ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT )
=\displaystyle== 1τ2(14τ2)=1τ2(12τ)(1+2τ)1superscript𝜏214superscript𝜏21superscript𝜏212𝜏12𝜏\displaystyle\frac{1}{\tau^{2}}\left(1-4\tau^{2}\right)=\frac{1}{\tau^{2}}% \left(1-2\tau\right)(1+2\tau)divide start_ARG 1 end_ARG start_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( 1 - 4 italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( 1 - 2 italic_τ ) ( 1 + 2 italic_τ )

Hence the largest value of τ𝜏\tauitalic_τ for which the discriminant satisfies Δ0Δ0\Delta\geqslant 0roman_Δ ⩾ 0 is τ=12𝜏12\tau=\frac{1}{2}italic_τ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG, which corresponds to a limit maximum value of ω𝜔\omegaitalic_ω equal to ω=32𝜔32\omega=\frac{3}{2}italic_ω = divide start_ARG 3 end_ARG start_ARG 2 end_ARG. These two observations highlight that the convergence rates given by Theorem 2, are faster than that given in Theorem 1 for suitable choices of α𝛼\alphaitalic_α, as expressed in the following corollary. Note that the convergence guarantees and best choices of α𝛼\alphaitalic_α depend on the value of the conditioning number since κ𝜅\kappaitalic_κ appears in Equation (22).

Corollary 1.

Let F=f+h𝐹𝑓F=f+hitalic_F = italic_f + italic_h where f𝑓fitalic_f is a convex differentiable function having a L𝐿Litalic_L-Lipschitz gradient for some L>0𝐿0L>0italic_L > 0, and hhitalic_h a proper convex l.s.c. function. Assume that F𝐹Fitalic_F satisfies a quadratic growth condition 𝒢μ2superscriptsubscript𝒢𝜇2\mathcal{G}_{\mu}^{2}caligraphic_G start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for some real parameter μ>0𝜇0\mu>0italic_μ > 0. Let κ=μL𝜅𝜇𝐿\kappa=\frac{\mu}{L}italic_κ = divide start_ARG italic_μ end_ARG start_ARG italic_L end_ARG.

Let (ω,τ)(+)2𝜔𝜏superscriptsubscript2(\omega,\tau)\in(\mathbb{R}_{+})^{2}( italic_ω , italic_τ ) ∈ ( blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT be two real parameters chosen such that:

(1ωκ)τ3ω(2ωκ)τ2+(ω2+2)τω=01𝜔𝜅superscript𝜏3𝜔2𝜔𝜅superscript𝜏2superscript𝜔22𝜏𝜔0\left(1-\omega\sqrt{\kappa}\right)\tau^{3}-\omega\left(2-\omega\sqrt{\kappa}% \right)\tau^{2}+(\omega^{2}+2)\tau-\omega=0( 1 - italic_ω square-root start_ARG italic_κ end_ARG ) italic_τ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - italic_ω ( 2 - italic_ω square-root start_ARG italic_κ end_ARG ) italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ) italic_τ - italic_ω = 0 (25)

and the value of τ𝜏\tauitalic_τ is maximum. Let (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be the sequence provided by (V-FISTA) with α=1ωκ𝛼1𝜔𝜅\alpha=1-\omega\sqrt{\kappa}italic_α = 1 - italic_ω square-root start_ARG italic_κ end_ARG. Then:

F(xn)F*𝐹subscript𝑥𝑛superscript𝐹\displaystyle F(x_{n})-F^{*}italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT C(1σκ)n(F(x0)F*)absent𝐶superscript1𝜎𝜅𝑛𝐹subscript𝑥0superscript𝐹\displaystyle\leqslant C\left(1-\sigma\sqrt{\kappa}\right)^{n}(F(x_{0})-F^{*})⩽ italic_C ( 1 - italic_σ square-root start_ARG italic_κ end_ARG ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_F ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) (26)

and

xnxn1=𝒪(eσ2κn),normsubscript𝑥𝑛subscript𝑥𝑛1𝒪superscript𝑒𝜎2𝜅𝑛\|x_{n}-x_{n-1}\|=\mathcal{O}\left(e^{-\frac{\sigma}{2}\sqrt{\kappa}n}\right),∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∥ = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_σ end_ARG start_ARG 2 end_ARG square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT ) , (27)

where:

C=1+(ωτ)2+(ωτ)ωτκ,σ=ττ2κ,formulae-sequence𝐶1superscript𝜔𝜏2𝜔𝜏𝜔𝜏𝜅𝜎𝜏superscript𝜏2𝜅C=1+(\omega-\tau)^{2}+(\omega-\tau)\omega\tau\sqrt{\kappa},\quad\sigma=\tau-% \tau^{2}\sqrt{\kappa},italic_C = 1 + ( italic_ω - italic_τ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_ω - italic_τ ) italic_ω italic_τ square-root start_ARG italic_κ end_ARG , italic_σ = italic_τ - italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT square-root start_ARG italic_κ end_ARG ,

and there exist three real-valued functions εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=1,2,3𝑖123i=1,2,3italic_i = 1 , 2 , 3 such that: limt0εi(t)=0subscriptnormal-→𝑡0subscript𝜀𝑖𝑡0\lim_{t\to 0}\varepsilon_{i}(t)=0roman_lim start_POSTSUBSCRIPT italic_t → 0 end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) = 0 and:

ω=32ε1(κ),τ=12ε2(κ),σ=12ε2(κ).formulae-sequence𝜔32subscript𝜀1𝜅formulae-sequence𝜏12subscript𝜀2𝜅𝜎12subscript𝜀2𝜅\omega=\frac{3}{2}-\varepsilon_{1}(\kappa),\qquad\tau=\frac{1}{2}-\varepsilon_% {2}(\kappa),\qquad\sigma=\frac{1}{2}-\varepsilon_{2}(\kappa).italic_ω = divide start_ARG 3 end_ARG start_ARG 2 end_ARG - italic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_κ ) , italic_τ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_κ ) , italic_σ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_κ ) .

Table 2 provides admissible sets of parameters (ω,τ)𝜔𝜏(\omega,\tau)( italic_ω , italic_τ ) for Corollary 1.

κ𝜅\kappaitalic_κ ω𝜔\omegaitalic_ω τ𝜏\tauitalic_τ σ𝜎\sigmaitalic_σ C𝐶Citalic_C
1111 1.21.21.21.2 0.390.39\mathbf{0.39}bold_0.39 0.230.230.230.23 2.042.042.042.04
1313\frac{1}{3}divide start_ARG 1 end_ARG start_ARG 3 end_ARG 1.321.321.321.32 0.420.42\mathbf{0.42}bold_0.42 0.310.310.310.31 2.12.12.12.1
101superscript10110^{-1}10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT 1.391.391.391.39 0.450.45\mathbf{0.45}bold_0.45 0.380.380.380.38 2.072.072.072.07
102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 1.461.461.461.46 0.480.48\mathbf{0.48}bold_0.48 0.450.450.450.45 2.032.032.032.03
103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 1.491.491.491.49 0.4940.494\mathbf{0.494}bold_0.494 0.4860.4860.4860.486 2.022.022.022.02
104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 1.4951.4951.4951.495 0.4980.498\mathbf{0.498}bold_0.498 0.4950.4950.4950.495 2.0022.0022.0022.002
Table 2: Admissible sets of parameters for Corollary 1.

Thus, Corollary 1 provides better convergence rates than Theorem 1 (since 2330.382330.38\frac{2}{3\sqrt{3}}\approx 0.38divide start_ARG 2 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG ≈ 0.38). We can remark that the guarantees given by Aujol et al. in [8] for ADR-𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT are still better with the additional assumption that F𝐹Fitalic_F has a unique minimizer.

In fact, Theorem 2 and inequality (21) hide more than improved convergence rates. To illustrate this, we provide a graph displaying the evolution of τ𝜏\tauitalic_τ with respect to ω𝜔\omegaitalic_ω and κ𝜅\kappaitalic_κ such that (τ,ω,κ)𝜏𝜔𝜅(\tau,\omega,\kappa)( italic_τ , italic_ω , italic_κ ) satisfy (22) in Figure 1. An interactive graph can be found on the link https://www.desmos.com/calculator/syrtiatos6. We can see on this graph that inequality (21) allows to obtain convergence guarantees even for non-optimal choices of α𝛼\alphaitalic_α, i.e. large values of ω𝜔\omegaitalic_ω.

Refer to caption
Figure 1: Evolution of τ𝜏\tauitalic_τ with respect to ω𝜔\omegaitalic_ω for several values of κ𝜅\kappaitalic_κ such that (τ,ω,κ)𝜏𝜔𝜅(\tau,\omega,\kappa)( italic_τ , italic_ω , italic_κ ) satisfy (22).

By exploiting this observation, the following corollary provides convergence rates for (V-FISTA) if α𝛼\alphaitalic_α is too small which can be the case if μ𝜇\muitalic_μ is overestimated. A brief proof is given in Section 5.3.

Corollary 2.

Let F=f+h𝐹𝑓F=f+hitalic_F = italic_f + italic_h where f𝑓fitalic_f is a convex differentiable function having a L𝐿Litalic_L-Lipschitz gradient for some L>0𝐿0L>0italic_L > 0, and hhitalic_h a proper convex l.s.c. function. Assume that F𝐹Fitalic_F satisfies a quadratic growth condition 𝒢μ2superscriptsubscript𝒢𝜇2\mathcal{G}_{\mu}^{2}caligraphic_G start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for some real parameter μ>0𝜇0\mu>0italic_μ > 0. Let (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be the sequence provided by (V-FISTA) with s=1L𝑠1𝐿s=\frac{1}{L}italic_s = divide start_ARG 1 end_ARG start_ARG italic_L end_ARG and α=1θ𝛼1𝜃\alpha=1-\thetaitalic_α = 1 - italic_θ for some θ[32κ,1)𝜃32𝜅1\theta\in\left[\frac{3}{2}\sqrt{\kappa},1\right)italic_θ ∈ [ divide start_ARG 3 end_ARG start_ARG 2 end_ARG square-root start_ARG italic_κ end_ARG , 1 ). Then, if κ110𝜅110\kappa\leqslant\frac{1}{10}italic_κ ⩽ divide start_ARG 1 end_ARG start_ARG 10 end_ARG,

F(xn)F*=𝒪(eτκn),𝐹subscript𝑥𝑛superscript𝐹𝒪superscript𝑒𝜏𝜅𝑛F(x_{n})-F^{*}=\mathcal{O}\left(e^{-\tau\kappa n}\right),italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - italic_τ italic_κ italic_n end_POSTSUPERSCRIPT ) , (28)

and

xnxn1=𝒪(eτ2κn)normsubscript𝑥𝑛subscript𝑥𝑛1𝒪superscript𝑒𝜏2𝜅𝑛\|x_{n}-x_{n-1}\|=\mathcal{O}\left(e^{-\frac{\tau}{2}\kappa n}\right)∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∥ = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_τ end_ARG start_ARG 2 end_ARG italic_κ italic_n end_POSTSUPERSCRIPT ) (29)

where τ=23θ(123θκ)𝜏23𝜃123𝜃𝜅\tau=\frac{2}{3\theta}\left(1-\frac{2}{3\theta}\sqrt{\kappa}\right)italic_τ = divide start_ARG 2 end_ARG start_ARG 3 italic_θ end_ARG ( 1 - divide start_ARG 2 end_ARG start_ARG 3 italic_θ end_ARG square-root start_ARG italic_κ end_ARG ).

Corollary 2 allows us to derive convergence rates for (V-FISTA) even if α𝛼\alphaitalic_α is far from its optimal value. Let us describe two examples:

  • Suppose that α=1Cκ𝛼1𝐶𝜅\alpha=1-C\sqrt{\kappa}italic_α = 1 - italic_C square-root start_ARG italic_κ end_ARG where C32𝐶32C\geqslant\frac{3}{2}italic_C ⩾ divide start_ARG 3 end_ARG start_ARG 2 end_ARG. Then, applying Corollary 2 with θ=Cκ𝜃𝐶𝜅\theta=C\sqrt{\kappa}italic_θ = italic_C square-root start_ARG italic_κ end_ARG, we get that the iterates of (V-FISTA) for this inertia parameter ensure a decrease of the error in 𝒪(eτκn)𝒪superscript𝑒𝜏𝜅𝑛\mathcal{O}\left(e^{-\tau\sqrt{\kappa}n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT - italic_τ square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT ) where τ=2(3C2)9C2𝜏23𝐶29superscript𝐶2\tau=\frac{2(3C-2)}{9C^{2}}italic_τ = divide start_ARG 2 ( 3 italic_C - 2 ) end_ARG start_ARG 9 italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG. Thus, if we choose α=132κ~𝛼132~𝜅\alpha=1-\frac{3}{2}\sqrt{\tilde{\kappa}}italic_α = 1 - divide start_ARG 3 end_ARG start_ARG 2 end_ARG square-root start_ARG over~ start_ARG italic_κ end_ARG end_ARG where κ~~𝜅\tilde{\kappa}over~ start_ARG italic_κ end_ARG is an upper estimation of κ𝜅\kappaitalic_κ, then we get a theoretical guarantee on the error with C=32κ~κ𝐶32~𝜅𝜅C=\frac{3}{2}\sqrt{\frac{\tilde{\kappa}}{\kappa}}italic_C = divide start_ARG 3 end_ARG start_ARG 2 end_ARG square-root start_ARG divide start_ARG over~ start_ARG italic_κ end_ARG end_ARG start_ARG italic_κ end_ARG end_ARG. In this way, we obtain that if κ~=10κ~𝜅10𝜅\tilde{\kappa}=10\kappaover~ start_ARG italic_κ end_ARG = 10 italic_κ, then the error decreases as 𝒪(eτκn)𝒪superscript𝑒𝜏𝜅𝑛\mathcal{O}\left(e^{-\tau\sqrt{\kappa}n}\right)caligraphic_O ( italic_e start_POSTSUPERSCRIPT - italic_τ square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT ) where τ0.12𝜏0.12\tau\approx 0.12italic_τ ≈ 0.12.

  • Assume now that α𝛼\alphaitalic_α is arbitrarily set to α=910𝛼910\alpha=\frac{9}{10}italic_α = divide start_ARG 9 end_ARG start_ARG 10 end_ARG without knowing the actual value of κ𝜅\kappaitalic_κ. Consequently, we have that α=1θ𝛼1𝜃\alpha=1-\thetaitalic_α = 1 - italic_θ where θ=110𝜃110\theta=\frac{1}{10}italic_θ = divide start_ARG 1 end_ARG start_ARG 10 end_ARG. If θ32κ𝜃32𝜅\theta\geqslant\frac{3}{2}\sqrt{\kappa}italic_θ ⩾ divide start_ARG 3 end_ARG start_ARG 2 end_ARG square-root start_ARG italic_κ end_ARG (i.e. if κ<1225𝜅1225\kappa<\frac{1}{225}italic_κ < divide start_ARG 1 end_ARG start_ARG 225 end_ARG), then Corollary 2 states that the error along the iterates of (V-FISTA) for this choice of α𝛼\alphaitalic_α decreases in the worst case as eτκnsuperscript𝑒𝜏𝜅𝑛e^{-\tau\kappa n}italic_e start_POSTSUPERSCRIPT - italic_τ italic_κ italic_n end_POSTSUPERSCRIPT where τ=203(1203κ)𝜏2031203𝜅\tau=\frac{20}{3}\left(1-\frac{20}{3}\sqrt{\kappa}\right)italic_τ = divide start_ARG 20 end_ARG start_ARG 3 end_ARG ( 1 - divide start_ARG 20 end_ARG start_ARG 3 end_ARG square-root start_ARG italic_κ end_ARG ) which can be upper bounded by 100273.7100273.7\frac{100}{27}\approx 3.7divide start_ARG 100 end_ARG start_ARG 27 end_ARG ≈ 3.7. As a consequence, if F𝐹Fitalic_F is sufficiently ill-conditioned, (V-FISTA) has better theoretical guarantees for this choice of α𝛼\alphaitalic_α than Forward-Backward which ensures that F(xn)F*=𝒪(eκn)𝐹subscript𝑥𝑛superscript𝐹𝒪superscript𝑒𝜅𝑛F(x_{n})-F^{*}=\mathcal{O}\left(e^{-\kappa n}\right)italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - italic_κ italic_n end_POSTSUPERSCRIPT ) [20].

The convergence guarantees obtained in non-optimal cases and the robustness to an overestimation of the growth parameter are the main contributions of this work. To the authors’ knowledge, there are no such results in the literature. This provides a better understanding of the behavior of the iterates of (V-FISTA) for a wide range of values of α𝛼\alphaitalic_α.

4 Convergence rates for the trajectories of the Heavy Ball dynamical system

The so called Heavy Ball equation

x¨(t)+αx˙(t)+F(x(t))=0¨𝑥𝑡𝛼˙𝑥𝑡𝐹𝑥𝑡0\ddot{x}(t)+\alpha\dot{x}(t)+\nabla F(x(t))=0over¨ start_ARG italic_x end_ARG ( italic_t ) + italic_α over˙ start_ARG italic_x end_ARG ( italic_t ) + ∇ italic_F ( italic_x ( italic_t ) ) = 0 (30)

where α>0𝛼0\alpha>0italic_α > 0 denotes the friction parameter, has been studied from decades now. See for example Attouch et al [3] for a general study of the dynamical system : existence of solutions, link with the mechanical system and convergence of the trajectories to critical points if no strong assumptions are made on F𝐹Fitalic_F. The main result of this section is to prove that if F𝐹Fitalic_F is convex, differentiable and satisfies a quadratic growth condition, the solution of (30) ensures a fast exponential decay. A crucial point here, is that the uniqueness of the minimizer of F𝐹Fitalic_F is not needed to get such fast rates. Before stating Theorem 3, we give an overview of the literature. We highlight that slower exponential decays are already known in this setting, that a fast decay is known if the minimizer of F𝐹Fitalic_F is supposed to be unique and that various other decays can be achieved in non convex settings. Note that the proof of Theorem 3 has served as a guide for the analysis of the Heavy Ball algorithm as performed in Section 3 and for the proof of Theorem 2.

4.1 State of the art

The first study of the convergence rate of F(x(t))F*𝐹𝑥𝑡superscript𝐹F(x(t))-F^{*}italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, where x𝑥xitalic_x is a solution of (30), under strong convexity analysis is due to Polyak [32] . In his seminal work Polyak observes that if the function F𝐹Fitalic_F is a quadratic function, F(x)=Ax2𝐹𝑥superscriptnorm𝐴𝑥2F(x)=\left\|{Ax}\right\|^{2}italic_F ( italic_x ) = ∥ italic_A italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, the solution of (30) ensures the fastest decay of F(x(t))F(x*)𝐹𝑥𝑡𝐹superscript𝑥F(x(t))-F(x^{*})italic_F ( italic_x ( italic_t ) ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) when α=μ𝛼𝜇\alpha=\sqrt{\mu}italic_α = square-root start_ARG italic_μ end_ARG where μ𝜇\muitalic_μ is the smallest non negative eigenvalue of AAsuperscript𝐴top𝐴A^{\top}Aitalic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A. He deduced that if F𝐹Fitalic_F is C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and μ𝜇\muitalic_μ-strongly convex (𝒮μ)subscript𝒮𝜇(\mathcal{S}_{\mu})( caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT ), the solution of (HBF) satisfies

F(x(t))F(x*)=𝒪(e2μt).𝐹𝑥𝑡𝐹superscript𝑥𝒪superscript𝑒2𝜇𝑡F(x(t))-F(x^{*})=\mathcal{O}(e^{-2\sqrt{\mu}t}).italic_F ( italic_x ( italic_t ) ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - 2 square-root start_ARG italic_μ end_ARG italic_t end_POSTSUPERSCRIPT ) . (31)

Both C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and strong convexity are necessary to achieve such a decay. During the last decades several works provide various bounds depending on geometrical assumptions made on F𝐹Fitalic_F. A summary is given in Table 3. To achieve an exponential decay of F(x(t))F(x*)𝐹𝑥𝑡𝐹superscript𝑥F(x(t))-F(x^{*})italic_F ( italic_x ( italic_t ) ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) a Łojasiewicz condition with parameter 1212\frac{1}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG is required in all these work. In [12], the authors proved that with other Łojasiewicz exponents only polynomial decay can be achieved. Nevertheless the exact exponential decay highly depends on the assumptions made on F𝐹Fitalic_F. If F𝐹Fitalic_F is μ𝜇\muitalic_μ-strongly convex and C1superscript𝐶1C^{1}italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT it was first proved that the decay was 𝒪(eμt)𝒪superscript𝑒𝜇𝑡\mathcal{O}(e^{-\sqrt{\mu}t})caligraphic_O ( italic_e start_POSTSUPERSCRIPT - square-root start_ARG italic_μ end_ARG italic_t end_POSTSUPERSCRIPT ), see for example Siegel [35] for a simple proof. Aujol et al [7] extend this former result giving a better rate for functions that are quasi-strongly convex and having a unique minimizer, and a weaker convergence rate if F𝐹Fitalic_F satisfies only a quadratic growth condition and has a unique minimizer, see Table 3 for more details. All these results ensure fast exponential decays of F(x(t))F(x*)𝐹𝑥𝑡𝐹superscript𝑥F(x(t))-F(x^{*})italic_F ( italic_x ( italic_t ) ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) and assume the convexity of F𝐹Fitalic_F, a quadratic growth condition and a uniqueness of the minimizer of F𝐹Fitalic_F.

In [13], Bégout et al. provide several results on the trajectory (xt)subscript𝑥𝑡(x_{t})( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), solution of (HBF) if F𝐹Fitalic_F is a C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT function satisfying some Łojasiewicz hypothesis but is not necessarily convex. The authors prove that under such hypothesis the trajectory strongly converges to a minimizer of F𝐹Fitalic_F, and provide several decay rates. If F𝐹Fitalic_F satisfies a Łojasiewicz hypothesis with an exponent θ]0,12[\theta\in]0,\frac{1}{2}[italic_θ ∈ ] 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG [, the decay rate is polynomial, see [13, Corollary 5.1]. Indeed this polynomial bound are similar to the ones of achieved by the gradient flow under similar hypotheses. If F𝐹Fitalic_F satisfies a Łojasiewicz hypothesis with an exponent θ=12𝜃12\theta=\frac{1}{2}italic_θ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG, the rate is actually exponential. More recent works by Polyak et al [31] and Apidopoulos et al [2] also provide exponential decay under Łojasiewicz hypothesis on F𝐹Fitalic_F depending on L𝐿Litalic_L and μ𝜇\muitalic_μ, see Table 3.

Reference Assumption on F𝐹Fitalic_F Exponential rate K𝐾Kitalic_K of F(x(t))F*𝐹𝑥𝑡superscript𝐹F(x(t))-F^{*}italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT

Polyak [32]

𝒮μsubscript𝒮𝜇\mathcal{S}_{\mu}caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT, C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and convexity

2μ2𝜇2\sqrt{\mu}2 square-root start_ARG italic_μ end_ARG

Siegel [35]

𝒮μsubscript𝒮𝜇\mathcal{S}_{\mu}caligraphic_S start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and convexity

μ𝜇\sqrt{\mu}square-root start_ARG italic_μ end_ARG

Aujol et al. [8]

𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and convexity

Uniqueness of the minimizer

(22)μ22𝜇(2-\sqrt{2})\sqrt{\mu}( 2 - square-root start_ARG 2 end_ARG ) square-root start_ARG italic_μ end_ARG

Polyak, Shcherbakov [31]

C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, Łojasiewicz with θ=12𝜃12\theta=\frac{1}{2}italic_θ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG and constant csubscript𝑐c_{\ell}italic_c start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, L𝐿Litalic_L-Lipschitz gradient

2μ32(2+1)μ+L2superscript𝜇3221𝜇𝐿2\frac{\mu^{\frac{3}{2}}}{(\sqrt{2}+1)\mu+L}2 divide start_ARG italic_μ start_POSTSUPERSCRIPT divide start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG ( square-root start_ARG 2 end_ARG + 1 ) italic_μ + italic_L end_ARG

with μ=12c2𝜇12superscriptsubscript𝑐2\mu=\frac{1}{2c_{\ell}^{2}}italic_μ = divide start_ARG 1 end_ARG start_ARG 2 italic_c start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

Apidopoulos et al. [2]

Łojasiewicz with θ=12𝜃12\theta=\frac{1}{2}italic_θ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG and constant csubscript𝑐c_{\ell}italic_c start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, and L𝐿Litalic_L-Lipschitz gradient

2(LμLμμ)μ2𝐿𝜇𝐿𝜇𝜇𝜇2\left(\sqrt{\frac{L}{\mu}}-\sqrt{\frac{L-\mu}{\mu}}\right)\sqrt{\mu}2 ( square-root start_ARG divide start_ARG italic_L end_ARG start_ARG italic_μ end_ARG end_ARG - square-root start_ARG divide start_ARG italic_L - italic_μ end_ARG start_ARG italic_μ end_ARG end_ARG ) square-root start_ARG italic_μ end_ARG

with μ=12c2𝜇12superscriptsubscript𝑐2\mu=\frac{1}{2c_{\ell}^{2}}italic_μ = divide start_ARG 1 end_ARG start_ARG 2 italic_c start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

Table 3: Convergence rate of F(x(t))F*𝐹𝑥𝑡superscript𝐹F(x(t))-F^{*}italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT where x𝑥xitalic_x is solution of (HBF) for the best choice of α𝛼\alphaitalic_α (which depends on the assumptions satisfied by F𝐹Fitalic_F). The constant K𝐾Kitalic_K is defined such that F(x(t))F*=𝒪(eKt)𝐹𝑥𝑡superscript𝐹𝒪superscript𝑒𝐾𝑡F(x(t))-F^{*}=\mathcal{O}\left(e^{-Kt}\right)italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - italic_K italic_t end_POSTSUPERSCRIPT ).

The goal of the next part is to show that under convexity and quadratic growth conditions, a faster exponential rate, independent of L𝐿Litalic_L, can be achieved for the solution of the Heavy Ball dynamical system (HBF) without assuming the uniqueness of the minimizer.

4.2 Fast exponential decay under quadratic growth conditions

We consider the Heavy Ball Friction (HBF) system defined as follows:

tt0,x¨(t)+αx˙(t)+F(x(t))=0,formulae-sequencefor-all𝑡subscript𝑡0¨𝑥𝑡𝛼˙𝑥𝑡𝐹𝑥𝑡0\forall t\geqslant t_{0},\quad\ddot{x}(t)+\alpha\dot{x}(t)+\nabla F(x(t))=0,∀ italic_t ⩾ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over¨ start_ARG italic_x end_ARG ( italic_t ) + italic_α over˙ start_ARG italic_x end_ARG ( italic_t ) + ∇ italic_F ( italic_x ( italic_t ) ) = 0 , (HBF)

where t0>0subscript𝑡00t_{0}>0italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0, α>0𝛼0\alpha>0italic_α > 0 and F:N:𝐹superscript𝑁F:\mathbb{R}^{N}\rightarrow\mathbb{R}italic_F : blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT → blackboard_R is a convex differentiable function satisfying some quadratic growth condition. Generalizing recent works [35, 8, 7] and making assumptions about the regularity of the boundary of the set of minimizers, we prove that the trajectories of (HBF) can achieve a fast exponential decay:

Theorem 3.

Let F𝐹Fitalic_F be a convex differentiable function having a non empty set of minimizers X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT. Suppose that X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT has a C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bound or that it is a polyhedral set. Let x𝑥xitalic_x be a solution of (HBF) for some t00subscript𝑡00t_{0}\geqslant 0italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⩾ 0 and α>0𝛼0\alpha>0italic_α > 0. If F𝐹Fitalic_F satisfies the assumption 𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT for some μ>0𝜇0\mu>0italic_μ > 0 and α=(222)μ𝛼222𝜇\alpha=\left(2-\frac{\sqrt{2}}{2}\right)\sqrt{\mu}italic_α = ( 2 - divide start_ARG square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG ) square-root start_ARG italic_μ end_ARG, then

tt0,F(x(t))F*(11222)M0e(22)μ(tt0),formulae-sequencefor-all𝑡subscript𝑡0𝐹𝑥𝑡superscript𝐹11222subscript𝑀0superscript𝑒22𝜇𝑡subscript𝑡0\forall t\geqslant t_{0},\leavevmode\nobreak\ F(x(t))-F^{*}\leqslant\left(% \frac{11}{2}-2\sqrt{2}\right)M_{0}e^{-(2-\sqrt{2})\sqrt{\mu}(t-t_{0})},∀ italic_t ⩾ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⩽ ( divide start_ARG 11 end_ARG start_ARG 2 end_ARG - 2 square-root start_ARG 2 end_ARG ) italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - ( 2 - square-root start_ARG 2 end_ARG ) square-root start_ARG italic_μ end_ARG ( italic_t - italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , (32)

where M0=F(x(t0))F*+12x˙(t0)2subscript𝑀0𝐹𝑥subscript𝑡0superscript𝐹12superscriptnormnormal-˙𝑥subscript𝑡02M_{0}=F(x(t_{0}))-F^{*}+\frac{1}{2}\|\dot{x}(t_{0})\|^{2}italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_F ( italic_x ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ over˙ start_ARG italic_x end_ARG ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.Moreover,

x˙(t)=𝒪(e(122)μt).norm˙𝑥𝑡𝒪superscript𝑒122𝜇𝑡\|\dot{x}(t)\|=\mathcal{O}\left(e^{-\left(1-\frac{\sqrt{2}}{2}\right)\sqrt{\mu% }t}\right).∥ over˙ start_ARG italic_x end_ARG ( italic_t ) ∥ = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - ( 1 - divide start_ARG square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG ) square-root start_ARG italic_μ end_ARG italic_t end_POSTSUPERSCRIPT ) . (33)

We give elements of proof in the following section and a demonstration of the second claim is given in Section A.1.

Proposition 1.

Let F𝐹Fitalic_F be a convex differentiable function having a non empty set of minimizers X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT. Suppose that X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT has a C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bound or that it is a polyhedral set. Assume that F𝐹Fitalic_F is a μ𝜇\muitalic_μ-quasi strongly convex function, i.e there exists μ>0𝜇0\mu>0italic_μ > 0 such that:

xN,F(x),xx*F(x)F*+μ2xx*2,formulae-sequencefor-all𝑥superscript𝑁𝐹𝑥𝑥superscript𝑥𝐹𝑥superscript𝐹𝜇2superscriptnorm𝑥superscript𝑥2\forall x\in\mathbb{R}^{N},\leavevmode\nobreak\ \langle\nabla F(x),x-x^{*}% \rangle\geqslant F(x)-F^{*}+\frac{\mu}{2}\|x-x^{*}\|^{2},∀ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT , ⟨ ∇ italic_F ( italic_x ) , italic_x - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ ⩾ italic_F ( italic_x ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ italic_x - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT denotes the projection of x𝑥xitalic_x onto X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT. Let x𝑥xitalic_x be a solution of (HBF) for some t00subscript𝑡00t_{0}\geqslant 0italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⩾ 0 and α>0𝛼0\alpha>0italic_α > 0. Then if α=32μ𝛼32𝜇\alpha=\frac{3}{\sqrt{2}}\sqrt{\mu}italic_α = divide start_ARG 3 end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG square-root start_ARG italic_μ end_ARG:

tt0,F(x(t))F*39M0e2μ(tt0),formulae-sequencefor-all𝑡subscript𝑡0𝐹𝑥𝑡superscript𝐹39subscript𝑀0superscript𝑒2𝜇𝑡subscript𝑡0\forall t\geqslant t_{0},\leavevmode\nobreak\ F(x(t))-F^{*}\leqslant 39M_{0}e^% {-\sqrt{2\mu}(t-t_{0})},∀ italic_t ⩾ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⩽ 39 italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - square-root start_ARG 2 italic_μ end_ARG ( italic_t - italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , (34)

where M0=F(x(t0))F*+12x˙(t0)2subscript𝑀0𝐹𝑥subscript𝑡0superscript𝐹12superscriptnormnormal-˙𝑥subscript𝑡02M_{0}=F(x(t_{0}))-F^{*}+\frac{1}{2}\|\dot{x}(t_{0})\|^{2}italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_F ( italic_x ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ over˙ start_ARG italic_x end_ARG ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Remark 1.

Theorem 3 and Proposition 1 are based on the assumption that the set of minimizers X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT has a C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bound or is a polyhedral set. More generally, the corresponding statements hold if the set X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is second order regular by the definition of Bonnans et al. [14], which is a weaker assumption. Given the technical nature of this hypothesis, the results are given for special cases. We refer the careful reader to the above reference for more details.

Remark 2.

The fact that a regularity assumption on the set of minimizers X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is needed to obtain these results is a curiosity, since no such hypothesis is required in the discrete case, i.e. for Theorems 1 and 2. It is directly related to the time-continuous nature of the trajectory x𝑥xitalic_x.

4.2.1 Comparisons and comments

The first study of (HBF) has been proposed by Polyak [32]. In this seminal work, Polyak consider a C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT μ𝜇\muitalic_μ-strongly convex functions. Polyak proved that for such functions the solution x𝑥xitalic_x of (HBF) satisfies F(x(t))F(x*)=O(e2μt)𝐹𝑥𝑡𝐹superscript𝑥𝑂superscript𝑒2𝜇𝑡F(x(t))-F(x^{*})=O(e^{-2\sqrt{\mu}t})italic_F ( italic_x ( italic_t ) ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) = italic_O ( italic_e start_POSTSUPERSCRIPT - 2 square-root start_ARG italic_μ end_ARG italic_t end_POSTSUPERSCRIPT ) for a suitable choice of the friction parameter α𝛼\alphaitalic_α. If the function F𝐹Fitalic_F is C1superscript𝐶1C^{1}italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and μ𝜇\muitalic_μ-strongly convex the convergence rate is weaker, see for example [35, 7]. If F𝐹Fitalic_F is C1superscript𝐶1C^{1}italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT, satisfies a quadratic growth condition and has a unique minimizer, which is a weaker assumption than strong convexity, Aujol et al. [7] proved that the solution of (HBF) satisfies F(x(t))F(x*)=O(e(22)μt)𝐹𝑥𝑡𝐹superscript𝑥𝑂superscript𝑒22𝜇𝑡F(x(t))-F(x^{*})=O(e^{-(2-\sqrt{2})\sqrt{\mu}t})italic_F ( italic_x ( italic_t ) ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) = italic_O ( italic_e start_POSTSUPERSCRIPT - ( 2 - square-root start_ARG 2 end_ARG ) square-root start_ARG italic_μ end_ARG italic_t end_POSTSUPERSCRIPT ) for another choice of the friction parameter α𝛼\alphaitalic_α, which is slighlty slower that the rate achieved by Polyak. All the above works use the fact that the function f𝑓fitalic_f to minimize has a unique minimizer. Indeed if F𝐹Fitalic_F is C1superscript𝐶1C^{1}italic_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT, convex and satisfies a quadratic growth and has several minimizers, there were no results ensuring that the solution of (HBF) satisfies F(xn)F(x*)=0(eCμt)𝐹subscript𝑥𝑛𝐹superscript𝑥0superscript𝑒𝐶𝜇𝑡F(x_{n})-F(x^{*})=0(e^{-C\sqrt{\mu}t})italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) = 0 ( italic_e start_POSTSUPERSCRIPT - italic_C square-root start_ARG italic_μ end_ARG italic_t end_POSTSUPERSCRIPT ) for any C>0𝐶0C>0italic_C > 0. As far as we know Theorem 3 is the first one ensuring such decay rate on this set of convex functions. This fast decay allows to prove the Theorem 2 ensuring a fast decay of an inertial algorithm on the same set of convex functions.

Several other articles provides interesting results decay rate of the solution of the Heavy Ball ODE (HBF). In [31, 2, 12] authors provides general analysis considering Łojasiewicz properties. In these three articles, some results on the trajectory x(t)𝑥𝑡x(t)italic_x ( italic_t ) or the error F(x(t))F(x*)𝐹𝑥𝑡𝐹superscript𝑥F(x(t))-F(x^{*})italic_F ( italic_x ( italic_t ) ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) are given. It is not simple to perform a fair comparison between these results and Theorem 3 because our analysis relies on the convexity and the global analysis of these works do not use this assumption. Nevertheless, in [12] and [2] provides some decay bounds when the convexity assumption is added. More precisely, in [12], Corollary 5.5 ensures that if F𝐹Fitalic_F is convex, C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and satisfies a quadratic growth condition with parameter μ𝜇\muitalic_μ then the trajectory x(t)𝑥𝑡x(t)italic_x ( italic_t ) converges to a minimizer x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPTof F𝐹Fitalic_F, the length of the trajectory is finite and x(t)x*2=O(eμt)superscriptnorm𝑥𝑡superscript𝑥2𝑂superscript𝑒𝜇𝑡\left\|{x(t)-x^{*}}\right\|^{2}=O(e^{-\mu t})∥ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_O ( italic_e start_POSTSUPERSCRIPT - italic_μ italic_t end_POSTSUPERSCRIPT ). Indeed, for such functions d(x,X*)22μ(F(x(t))F(x*))𝑑superscript𝑥superscript𝑋22𝜇𝐹𝑥𝑡𝐹superscript𝑥d(x,X^{*})^{2}\leqslant\frac{2}{\mu}(F(x(t))-F(x^{*}))italic_d ( italic_x , italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ divide start_ARG 2 end_ARG start_ARG italic_μ end_ARG ( italic_F ( italic_x ( italic_t ) ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ) and the Theorem 3 ensures a better decay rate of the trajectory to the set of minimizers.

The work of Apidoupoulos et al. [2] deepens the one of Polyak et al. [31] providing explicit decay of F(x(t))F(x*)𝐹𝑥𝑡𝐹superscript𝑥F(x(t))-F(x^{*})italic_F ( italic_x ( italic_t ) ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) under similar hypothesis i.e Łojasiewicz properties, C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT assumptions and a uniform bound on the Hessian of F𝐹Fitalic_F in the neighborhood of the set of minimizers. That is why we compare our results to those in [2], but the same conclusions hold with [31]. The bounds provided by the authors depend on a uniform bound L𝐿Litalic_L on the Hessian of F𝐹Fitalic_F which is not the case for Theorem 3 whose bound is better than Theorem 2 of [2] when Lμ>3𝐿𝜇3\frac{L}{\mu}>3divide start_ARG italic_L end_ARG start_ARG italic_μ end_ARG > 3. It turns out that the analysis of Apidopoulos et al has been developed in a non convex setting and in this setting, the use of this bound on the Hessian seems the only known way to get bounds on F(xn)F(x*)𝐹subscript𝑥𝑛𝐹superscript𝑥F(x_{n})-F(x^{*})italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ). The convexity seems to be a price to pay to get bounds independent of this Lipschitz constant L𝐿Litalic_L.

This analysis of the Heavy Ball dynamical provides a guideline for the analysis of the analysis of the Heavy Ball algorithm.

Remark 3.

Even if the convexity of F𝐹Fitalic_F seems to be a key hypothesis to reach such decay rate, the careful reader may notice that Theorem 3 actually holds for star convex functions i.e functions satisfying:

xn,x*X*,F(x)F(x*)xx*,F(x).formulae-sequencefor-all𝑥superscript𝑛formulae-sequencefor-allsuperscript𝑥superscript𝑋𝐹𝑥𝐹superscript𝑥𝑥superscript𝑥𝐹𝑥\forall x\in\mathbb{R}^{n},\leavevmode\nobreak\ \forall x^{*}\in X^{*},% \leavevmode\nobreak\ F(x)-F(x^{*})\leqslant\left\langle x-x^{*},\nabla F(x)% \right\rangle.∀ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , ∀ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_F ( italic_x ) - italic_F ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ⩽ ⟨ italic_x - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , ∇ italic_F ( italic_x ) ⟩ .

where X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT denotes the set of minimizers of F𝐹Fitalic_F.

4.2.2 Elements of proof

The results obtained in this paper rely on a Lyapunov approach. Let us recall that when F𝐹Fitalic_F has a unique minimizer i.e X*={x*}superscript𝑋superscript𝑥X^{*}=\{x^{*}\}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = { italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT }, then a classical Lyapunov choice for (HBF) is:

(t)=F(x(t))F*+12λ(x(t)x*)+x˙(t)2+ξ2x(t)x*2,𝑡𝐹𝑥𝑡superscript𝐹12superscriptnorm𝜆𝑥𝑡superscript𝑥˙𝑥𝑡2𝜉2superscriptnorm𝑥𝑡superscript𝑥2\mathcal{E}(t)=F(x(t))-F^{*}+\frac{1}{2}\|\lambda(x(t)-x^{*})+\dot{x}(t)\|^{2}% +\frac{\xi}{2}\|x(t)-x^{*}\|^{2},caligraphic_E ( italic_t ) = italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_λ ( italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + over˙ start_ARG italic_x end_ARG ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_ξ end_ARG start_ARG 2 end_ARG ∥ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (35)

for some well-chosen parameters λ>0𝜆0\lambda>0italic_λ > 0 and ξ𝜉\xi\in\mathbb{R}italic_ξ ∈ blackboard_R. Our approach to extend that type of analysis without the uniqueness assumption is to adapt the Lyapunov energy to our relaxed setting. Let F𝐹Fitalic_F have a non empty set of minimizers X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT which is not reduced to one point. Let

*(t)=F(x(t))F*+12λ(x(t)x*(t))+x˙(t)2+ξ2x(t)x*2,superscript𝑡𝐹𝑥𝑡superscript𝐹12superscriptnorm𝜆𝑥𝑡superscript𝑥𝑡˙𝑥𝑡2𝜉2superscriptnorm𝑥𝑡superscript𝑥2\mathcal{E}^{*}(t)=F(x(t))-F^{*}+\frac{1}{2}\|\lambda(x(t)-x^{*}(t))+\dot{x}(t% )\|^{2}+\frac{\xi}{2}\|x(t)-x^{*}\|^{2},caligraphic_E start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) = italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_λ ( italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ) + over˙ start_ARG italic_x end_ARG ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_ξ end_ARG start_ARG 2 end_ARG ∥ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (36)

where for all tt0𝑡subscript𝑡0t\geqslant t_{0}italic_t ⩾ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, x*(t)superscript𝑥𝑡x^{*}(t)italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) denotes the projection of x(t)𝑥𝑡x(t)italic_x ( italic_t ) onto X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, i.e.

x*(t)=arginfx*X*x(t)x*2:=PX*(x(t)).superscript𝑥𝑡argsubscriptinfimumsuperscript𝑥superscript𝑋superscriptnorm𝑥𝑡superscript𝑥2assignsubscript𝑃superscript𝑋𝑥𝑡x^{*}(t)=\textup{arg}\,\inf\limits_{x^{*}\in X^{*}}\|x(t)-x^{*}\|^{2}:=P_{X^{*% }}(x(t)).italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) = arg roman_inf start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT := italic_P start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ( italic_t ) ) .

This slight modification leads to a question when attempting to differentiate the Lyapunov energy: is tx*(t)maps-to𝑡superscript𝑥𝑡t\mapsto x^{*}(t)italic_t ↦ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) differentiable?

The smoothness of tx*(t)maps-to𝑡superscript𝑥𝑡t\mapsto x^{*}(t)italic_t ↦ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) is related to the smoothness of PX*subscript𝑃superscript𝑋P_{X^{*}}italic_P start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. In fact, if PX*subscript𝑃superscript𝑋P_{X^{*}}italic_P start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is directionally differentiable then tx*(t)maps-to𝑡superscript𝑥𝑡t\mapsto x^{*}(t)italic_t ↦ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) is right-differentiable (and left-differentiable) and its right-hand derivative is equal to PX*(x(t),x˙(t))subscriptsuperscript𝑃superscript𝑋𝑥𝑡˙𝑥𝑡P^{\prime}_{X^{*}}(x(t),\dot{x}(t))italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ( italic_t ) , over˙ start_ARG italic_x end_ARG ( italic_t ) ).

In [14, Theorem 7.2], Bonnans et al. prove that if a closed convex set 𝒮𝒳𝒮𝒳\mathcal{S}\subset\mathcal{X}caligraphic_S ⊂ caligraphic_X is second order regular at P𝒮(x)subscript𝑃𝒮𝑥P_{\mathcal{S}}(x)italic_P start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( italic_x ) for some x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X, then P𝒮subscript𝑃𝒮P_{\mathcal{S}}italic_P start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT is directionally differentiable at x𝑥xitalic_x. We refer the reader to [14, 34] to have a complete understanding of the notion of second order regularity. Note that this geometry assumption is satisfied by sets having a C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bound [23] and polyhedral sets [34].

Throughout the rest of this section we assume that the set of minimizers is second order regular. Consequently, tx*(t)maps-to𝑡superscript𝑥𝑡t\mapsto x^{*}(t)italic_t ↦ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) is right-differentiable as well as \mathcal{E}caligraphic_E. For the sake of simplicity, let x*˙˙superscript𝑥\dot{x^{*}}over˙ start_ARG italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG and ˙˙\dot{\mathcal{E}}over˙ start_ARG caligraphic_E end_ARG denote the corresponding right-hand derivatives. We can write that:

˙*(t)=D(t)(λ2+ξ)x(t)x*(t),x*˙(t)λx˙(t),x*˙(t),superscript˙𝑡𝐷𝑡superscript𝜆2𝜉𝑥𝑡superscript𝑥𝑡˙superscript𝑥𝑡𝜆˙𝑥𝑡˙superscript𝑥𝑡\dot{\mathcal{E}}^{*}(t)=D(t)-(\lambda^{2}+\xi)\langle x(t)-x^{*}(t),\dot{x^{*% }}(t)\rangle-\lambda\langle\dot{x}(t),\dot{x^{*}}(t)\rangle,over˙ start_ARG caligraphic_E end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) = italic_D ( italic_t ) - ( italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ξ ) ⟨ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) , over˙ start_ARG italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG ( italic_t ) ⟩ - italic_λ ⟨ over˙ start_ARG italic_x end_ARG ( italic_t ) , over˙ start_ARG italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG ( italic_t ) ⟩ , (37)

where

D(t)=F(x(t)),x˙(t)+λ(x(t)x*(t))+x˙(t),λx˙(t)+x¨(t)+ξx(t)x*(t),x˙(t).𝐷𝑡𝐹𝑥𝑡˙𝑥𝑡𝜆𝑥𝑡superscript𝑥𝑡˙𝑥𝑡𝜆˙𝑥𝑡¨𝑥𝑡𝜉𝑥𝑡superscript𝑥𝑡˙𝑥𝑡D(t)=\langle\nabla F(x(t)),\dot{x}(t)\rangle+\langle\lambda(x(t)-x^{*}(t))+% \dot{x}(t),\lambda\dot{x}(t)+\ddot{x}(t)\rangle+\xi\langle x(t)-x^{*}(t),\dot{% x}(t)\rangle.italic_D ( italic_t ) = ⟨ ∇ italic_F ( italic_x ( italic_t ) ) , over˙ start_ARG italic_x end_ARG ( italic_t ) ⟩ + ⟨ italic_λ ( italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ) + over˙ start_ARG italic_x end_ARG ( italic_t ) , italic_λ over˙ start_ARG italic_x end_ARG ( italic_t ) + over¨ start_ARG italic_x end_ARG ( italic_t ) ⟩ + italic_ξ ⟨ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) , over˙ start_ARG italic_x end_ARG ( italic_t ) ⟩ .

Observe that D𝐷Ditalic_D is exactly equal to respectively ˙˙\dot{\mathcal{E}}over˙ start_ARG caligraphic_E end_ARG if F𝐹Fitalic_F has a unique minimizer x*superscript𝑥x^{*}italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT. The objective is then to control the additional terms x(t)x*(t),x*˙(t)𝑥𝑡superscript𝑥𝑡˙superscript𝑥𝑡\langle x(t)-x^{*}(t),\dot{x^{*}}(t)\rangle⟨ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) , over˙ start_ARG italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG ( italic_t ) ⟩ and x˙(t),x*˙(t)˙𝑥𝑡˙superscript𝑥𝑡\langle\dot{x}(t),\dot{x^{*}}(t)\rangle⟨ over˙ start_ARG italic_x end_ARG ( italic_t ) , over˙ start_ARG italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG ( italic_t ) ⟩. We introduce Figure 2 to give an intuition of the behaviour of these terms.

X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPTx(t)𝑥𝑡x(t)italic_x ( italic_t )\bulletx*(t)superscript𝑥𝑡x^{*}(t)italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t )\bulletx˙(t)˙𝑥𝑡\dot{x}(t)over˙ start_ARG italic_x end_ARG ( italic_t )x*˙(t)˙superscript𝑥𝑡\dot{x^{*}}(t)over˙ start_ARG italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG ( italic_t )x(t)x*(t)𝑥𝑡superscript𝑥𝑡x(t)-x^{*}(t)italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t )X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPTx(t)𝑥𝑡x(t)italic_x ( italic_t )\bulletx*(t)superscript𝑥𝑡x^{*}(t)italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t )\bulletx˙(t)˙𝑥𝑡\dot{x}(t)over˙ start_ARG italic_x end_ARG ( italic_t )x*˙(t)=0˙superscript𝑥𝑡0\dot{x^{*}}(t)=0over˙ start_ARG italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG ( italic_t ) = 0x(t)x*(t)𝑥𝑡superscript𝑥𝑡x(t)-x^{*}(t)italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t )
Figure 2: Behaviour of x*˙˙superscript𝑥\dot{x^{*}}over˙ start_ARG italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG for a set of minimizers having a C2superscript𝐶2C^{2}italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bound (on the left) and a polyhedral set of minimizers (on the right).

We can first prove that x˙(t),x*˙(t)˙𝑥𝑡˙superscript𝑥𝑡\langle\dot{x}(t),\dot{x^{*}}(t)\rangle⟨ over˙ start_ARG italic_x end_ARG ( italic_t ) , over˙ start_ARG italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG ( italic_t ) ⟩ is positive by using the expression x*˙(t)=limh0x*(t+h)x*(t)h˙superscript𝑥𝑡subscript0superscript𝑥𝑡superscript𝑥𝑡\dot{x^{*}}(t)=\lim\limits_{h\rightarrow 0}\frac{x^{*}(t+h)-x^{*}(t)}{h}over˙ start_ARG italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG ( italic_t ) = roman_lim start_POSTSUBSCRIPT italic_h → 0 end_POSTSUBSCRIPT divide start_ARG italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t + italic_h ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) end_ARG start_ARG italic_h end_ARG and the property of the projection onto a convex set. Indeed, as X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is a closed convex set, for any xN𝑥superscript𝑁x\in\mathbb{R}^{N}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT and uX*𝑢superscript𝑋u\in X^{*}italic_u ∈ italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT:

xPX*(x),uPX*(x)0.𝑥subscript𝑃superscript𝑋𝑥𝑢subscript𝑃superscript𝑋𝑥0\langle x-P_{X^{*}}(x),u-P_{X^{*}}(x)\rangle\leqslant 0.⟨ italic_x - italic_P start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) , italic_u - italic_P start_POSTSUBSCRIPT italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) ⟩ ⩽ 0 .

Thus, for any h>00h>0italic_h > 0 we have:

x(t+h)x(t),x*(t+h)x*(t)𝑥𝑡𝑥𝑡superscript𝑥𝑡superscript𝑥𝑡\displaystyle\langle x(t+h)-x(t),x^{*}(t+h)-x^{*}(t)\rangle⟨ italic_x ( italic_t + italic_h ) - italic_x ( italic_t ) , italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t + italic_h ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ⟩ =x(t+h)x*(t+h),x*(t+h)x*(t)absent𝑥𝑡superscript𝑥𝑡superscript𝑥𝑡superscript𝑥𝑡\displaystyle=\langle x(t+h)-x^{*}(t+h),x^{*}(t+h)-x^{*}(t)\rangle= ⟨ italic_x ( italic_t + italic_h ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t + italic_h ) , italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t + italic_h ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ⟩
+x*(t+h)x*(t)2superscriptnormsuperscript𝑥𝑡superscript𝑥𝑡2\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ +% \|x^{*}(t+h)-x^{*}(t)\|^{2}+ ∥ italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t + italic_h ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+x(t)x*(t),x*(t)x*(t+h)0.𝑥𝑡superscript𝑥𝑡superscript𝑥𝑡superscript𝑥𝑡0\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ +% \langle x(t)-x^{*}(t),x^{*}(t)-x^{*}(t+h)\rangle\geqslant 0.+ ⟨ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) , italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t + italic_h ) ⟩ ⩾ 0 .

By considering hhitalic_h tending towards 00 we can deduce that x˙(t),x*˙(t)0˙𝑥𝑡˙superscript𝑥𝑡0\langle\dot{x}(t),\dot{x^{*}}(t)\rangle\geqslant 0⟨ over˙ start_ARG italic_x end_ARG ( italic_t ) , over˙ start_ARG italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG ( italic_t ) ⟩ ⩾ 0.

In [14, Theorem 7.2] the authors give an expression of the directional derivative P𝒮(x,d)subscriptsuperscript𝑃𝒮𝑥𝑑P^{\prime}_{\mathcal{S}}(x,d)italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( italic_x , italic_d ) for a closed convex set 𝒮𝒳𝒮𝒳\mathcal{S}\subset\mathcal{X}caligraphic_S ⊂ caligraphic_X being second order regular at P𝒮(x)subscript𝑃𝒮𝑥P_{\mathcal{S}}(x)italic_P start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( italic_x ) for some x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X. This directional derivative satisfies:

xP𝒮(x),P𝒮(x,d)=0.𝑥subscript𝑃𝒮𝑥subscriptsuperscript𝑃𝒮𝑥𝑑0\langle x-P_{\mathcal{S}}(x),P^{\prime}_{\mathcal{S}}(x,d)\rangle=0.⟨ italic_x - italic_P start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( italic_x ) , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_S end_POSTSUBSCRIPT ( italic_x , italic_d ) ⟩ = 0 .

Considering the assumptions made on X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT we can deduce that x(t)x*(t),x*˙(t)=0𝑥𝑡superscript𝑥𝑡˙superscript𝑥𝑡0\langle x(t)-x^{*}(t),\dot{x^{*}}(t)\rangle=0⟨ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) , over˙ start_ARG italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG ( italic_t ) ⟩ = 0 for all tt0𝑡subscript𝑡0t\geqslant t_{0}italic_t ⩾ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

These results ensure that for any choice of parameter λ>0𝜆0\lambda>0italic_λ > 0 and ξ𝜉\xi\in\mathbb{R}italic_ξ ∈ blackboard_R,˙*(t)D(t)superscript˙𝑡𝐷𝑡\dot{\mathcal{E}}^{*}(t)\leqslant D(t)over˙ start_ARG caligraphic_E end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ⩽ italic_D ( italic_t ). From this point, the proofs of the convergence results stated in Theorem 3 and Proposition 1 follow the original proofs, taking D𝐷Ditalic_D instead of ˙˙\dot{\mathcal{E}}over˙ start_ARG caligraphic_E end_ARG and by applying the following lemma. A proof is given in Section A.2.

Lemma 1.

Let ϕ:normal-:italic-ϕnormal-→\phi:\mathbb{R}\rightarrow\mathbb{R}italic_ϕ : blackboard_R → blackboard_R be a continuous function which is right-differentiable. Assume that

tt0,ϕ+(t)ψ(t),formulae-sequencefor-all𝑡subscript𝑡0subscriptitalic-ϕ𝑡𝜓𝑡\forall t\geqslant t_{0},\leavevmode\nobreak\ \phi_{+}(t)\leqslant\psi(t),∀ italic_t ⩾ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ( italic_t ) ⩽ italic_ψ ( italic_t ) , (38)

where ϕ+(t)=limh0,h>0ϕ(t+h)ϕ(t)hsubscriptitalic-ϕ𝑡subscriptformulae-sequencenormal-→00italic-ϕ𝑡italic-ϕ𝑡\phi_{+}(t)=\lim\limits_{h\rightarrow 0,\leavevmode\nobreak\ h>0}\frac{\phi(t+% h)-\phi(t)}{h}italic_ϕ start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ( italic_t ) = roman_lim start_POSTSUBSCRIPT italic_h → 0 , italic_h > 0 end_POSTSUBSCRIPT divide start_ARG italic_ϕ ( italic_t + italic_h ) - italic_ϕ ( italic_t ) end_ARG start_ARG italic_h end_ARG denotes the right derivative of ϕitalic-ϕ\phiitalic_ϕ at t𝑡titalic_t. Then,

tt0,ϕ(t)ϕ(t0)+t0tψ(u)𝑑u.formulae-sequencefor-all𝑡subscript𝑡0italic-ϕ𝑡italic-ϕsubscript𝑡0superscriptsubscriptsubscript𝑡0𝑡𝜓𝑢differential-d𝑢\forall t\geqslant t_{0},\leavevmode\nobreak\ \phi(t)\leqslant\phi(t_{0})+\int% _{t_{0}}^{t}\psi(u)du.∀ italic_t ⩾ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ϕ ( italic_t ) ⩽ italic_ϕ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ψ ( italic_u ) italic_d italic_u . (39)

5 Proofs of Theorems 1 and 2

The proofs of Theorems 1 and 2 are built around the following discrete Lyapunov energies

n=2L(F(xn)F*)+λ(xn1xn1*)+xnxn12,subscript𝑛2𝐿𝐹subscript𝑥𝑛superscript𝐹superscriptnorm𝜆subscript𝑥𝑛1superscriptsubscript𝑥𝑛1subscript𝑥𝑛subscript𝑥𝑛12\mathcal{E}_{n}=\frac{2}{L}(F(x_{n})-F^{*})+\left\|\lambda(x_{n-1}-x_{n-1}^{*}% )+x_{n}-x_{n-1}\right\|^{2},caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 2 end_ARG start_ARG italic_L end_ARG ( italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ∥ italic_λ ( italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (40)

and

n=2L(F(xn)F*)+αλ(xnxn*)+xnxn12+λ(1α)2xnxn*2,subscript𝑛2𝐿𝐹subscript𝑥𝑛superscript𝐹𝛼superscriptnorm𝜆subscript𝑥𝑛superscriptsubscript𝑥𝑛subscript𝑥𝑛subscript𝑥𝑛12𝜆superscript1𝛼2superscriptnormsubscript𝑥𝑛superscriptsubscript𝑥𝑛2\mathcal{E}_{n}=\frac{2}{L}(F(x_{n})-F^{*})+\alpha\left\|\lambda(x_{n}-x_{n}^{% *})+x_{n}-x_{n-1}\right\|^{2}+\lambda(1-\alpha)^{2}\|x_{n}-x_{n}^{*}\|^{2},caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 2 end_ARG start_ARG italic_L end_ARG ( italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + italic_α ∥ italic_λ ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (41)

where λ>0𝜆0\lambda>0italic_λ > 0 and xn*superscriptsubscript𝑥𝑛x_{n}^{*}italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT denotes the projection of xnsubscript𝑥𝑛x_{n}italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT onto X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT for any n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N. For the sake of clarity, we use the following notations:

wn=2L(F(xn)F*)),hn=xnx*n2,δn=xnxn12,γn*=xn*xn1*2.\begin{gathered}w_{n}=\frac{2}{L}(F(x_{n})-F^{*})),\leavevmode\nobreak\ h_{n}=% \|x_{n}-x^{*}_{n}\|^{2},\\ \delta_{n}=\|x_{n}-x_{n-1}\|^{2},\leavevmode\nobreak\ \gamma_{n}^{*}=\|x_{n}^{% *}-x_{n-1}^{*}\|^{2}.\end{gathered}start_ROW start_CELL italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 2 end_ARG start_ARG italic_L end_ARG ( italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ) , italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = ∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . end_CELL end_ROW (42)

The following lemma is necessary in order to handle the terms related to non uniqueness of the minimizers. We give a proof in Section A.3.

Lemma 2.

For all n*𝑛superscriptn\in\mathbb{N}^{*}italic_n ∈ blackboard_N start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, the following equalities hold:

  1. 1.

    xnxn*,xnxn1=12(hnhn1+δnγn*)+xn1xn1*,xn*xn1*.subscript𝑥𝑛superscriptsubscript𝑥𝑛subscript𝑥𝑛subscript𝑥𝑛112subscript𝑛subscript𝑛1subscript𝛿𝑛superscriptsubscript𝛾𝑛subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\langle x_{n}-x_{n}^{*},x_{n}-x_{n-1}\rangle=\frac{1}{2}(h_{n}-h_{n-1}+\delta_% {n}-\gamma_{n}^{*})+\langle x_{n-1}-x_{n-1}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle.⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ⟩ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ .

  2. 2.

    xn1xn1*,xnxn1=12(hnhn1δn+γn*)+xnxn*,xn*xn1*.subscript𝑥𝑛1superscriptsubscript𝑥𝑛1subscript𝑥𝑛subscript𝑥𝑛112subscript𝑛subscript𝑛1subscript𝛿𝑛superscriptsubscript𝛾𝑛subscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\langle x_{n-1}-x_{n-1}^{*},x_{n}-x_{n-1}\rangle=\frac{1}{2}(h_{n}-h_{n-1}-% \delta_{n}+\gamma_{n}^{*})+\langle x_{n}-x_{n}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle.⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ⟩ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ .

Moreover, we introduce a lemma which encodes the fact that the sequence (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT is provided by (V-FISTA). The proof is based on the descent lemma proved in [16] and it can be found in Appendix A.4.

Lemma 3.

Let (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be the sequence provided by (V-FISTA) with s=1L𝑠1𝐿s=\frac{1}{L}italic_s = divide start_ARG 1 end_ARG start_ARG italic_L end_ARG. Then, for any n*𝑛superscriptn\in\mathbb{N}^{*}italic_n ∈ blackboard_N start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT,

wn+1wnsubscript𝑤𝑛1subscript𝑤𝑛\displaystyle w_{n+1}-w_{n}italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT \displaystyle\leqslant α2δnδn+1,superscript𝛼2subscript𝛿𝑛subscript𝛿𝑛1\displaystyle\alpha^{2}\delta_{n}-\delta_{n+1},italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ,
wn+1subscript𝑤𝑛1\displaystyle w_{n+1}italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT \displaystyle\leqslant (1+α)hn+(α2+α)δnαhn1hn+1γn+1*αγn*1𝛼subscript𝑛superscript𝛼2𝛼subscript𝛿𝑛𝛼subscript𝑛1subscript𝑛1superscriptsubscript𝛾𝑛1𝛼superscriptsubscript𝛾𝑛\displaystyle(1+\alpha)h_{n}+(\alpha^{2}+\alpha)\delta_{n}-\alpha h_{n-1}-h_{n% +1}-\gamma_{n+1}^{*}-\alpha\gamma_{n}^{*}\vspace{.2cm}( 1 + italic_α ) italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + ( italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α ) italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_α italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_α italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT
+2αxn1xn1*,xn*xn1*2xn+1xn+1*,xn+1*xn*.2𝛼subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛12subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛\displaystyle+2\alpha\langle x_{n-1}-x_{n-1}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle-% 2\langle x_{n+1}-x_{n+1}^{*},x_{n+1}^{*}-x_{n}^{*}\rangle.+ 2 italic_α ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ - 2 ⟨ italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ .

We would like to point out that several controls are deduced from the properties of the projection onto a convex. Indeed, if C𝐶Citalic_C is a closed convex set such that CN𝐶superscript𝑁C\subset\mathbb{R}^{N}italic_C ⊂ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, then for any xN𝑥superscript𝑁x\in\mathbb{R}^{N}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT and yC𝑦𝐶y\in Citalic_y ∈ italic_C,

xp,yp0,𝑥𝑝𝑦𝑝0\langle x-p,y-p\rangle\leqslant 0,⟨ italic_x - italic_p , italic_y - italic_p ⟩ ⩽ 0 ,

where p𝑝pitalic_p denotes the projection of x𝑥xitalic_x onto C𝐶Citalic_C. This property directly guarantees inequalities such as

xnxn*,xn*xn1*0,subscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛10\langle x_{n}-x_{n}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle\geqslant 0,⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ ⩾ 0 ,

or

xn1xn1*,xn*xn1*0.subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛10\langle x_{n-1}-x_{n-1}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle\leqslant 0.⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ ⩽ 0 .

5.1 Proof of Theorem 1

Let (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be the sequence provided by (V-FISTA) for some α>0𝛼0\alpha>0italic_α > 0 to be defined. We define the following discrete Lyapunov energy:

n=2L(F(xn)F*)+xnxn1+λ(xn1xn1*)2,subscript𝑛2𝐿𝐹subscript𝑥𝑛superscript𝐹superscriptnormsubscript𝑥𝑛subscript𝑥𝑛1𝜆subscript𝑥𝑛1superscriptsubscript𝑥𝑛12\mathcal{E}_{n}=\frac{2}{L}(F(x_{n})-F^{*})+\|x_{n}-x_{n-1}+\lambda(x_{n-1}-x_% {n-1}^{*})\|^{2},caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 2 end_ARG start_ARG italic_L end_ARG ( italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT + italic_λ ( italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (43)

where λ>0𝜆0\lambda>0italic_λ > 0 and xn*superscriptsubscript𝑥𝑛x_{n}^{*}italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT denotes the projection of xnsubscript𝑥𝑛x_{n}italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT on X*superscript𝑋X^{*}italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT for any n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N. By setting λ=κ𝜆𝜅\lambda=\sqrt{\kappa}italic_λ = square-root start_ARG italic_κ end_ARG and considering that F𝐹Fitalic_F has a unique minimizer, we recover the energy considered by Beck in [10].

The aim of this proof is to find τ>0𝜏0\tau>0italic_τ > 0 as large as possible such that for a well-chosen set of parameters,

n+1(1τκ)n0.subscript𝑛11𝜏𝜅subscript𝑛0\mathcal{E}_{n+1}-(1-\tau\sqrt{\kappa})\mathcal{E}_{n}\leqslant 0.caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - ( 1 - italic_τ square-root start_ARG italic_κ end_ARG ) caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ 0 . (44)

The proof is divided into three parts. We first use the lemmas introduced in the introduction of Section 5 and the properties of the projection onto a convex to handle the terms related to the non uniqueness of the minimizers. Then, we give a set of parameters which leads to the wanted inequality (44) by using the geometry assumption satisfied by F𝐹Fitalic_F. The convergence of the trajectories is obtained in the last section using the previous results and elementary computations.

5.1.1 Preliminary work

We recall that we use the notations defined in (42). By rewriting (43) and using the second claim of Lemma 2 we have:

n=wn+(1λ)δn+λ(hnhn1)+λ2hn1+λγn*+2λxnxn*,xn*xn1*.subscript𝑛subscript𝑤𝑛1𝜆subscript𝛿𝑛𝜆subscript𝑛subscript𝑛1superscript𝜆2subscript𝑛1𝜆superscriptsubscript𝛾𝑛2𝜆subscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\mathcal{E}_{n}=w_{n}+(1-\lambda)\delta_{n}+\lambda(h_{n}-h_{n-1})+\lambda^{2}% h_{n-1}+\lambda\gamma_{n}^{*}+2\lambda\langle x_{n}-x_{n}^{*},x_{n}^{*}-x_{n-1% }^{*}\rangle.caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + ( 1 - italic_λ ) italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_λ ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) + italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT + italic_λ italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + 2 italic_λ ⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ . (45)

Lemma 3 ensures that if λ1𝜆1\lambda\leqslant 1italic_λ ⩽ 1:

wn+1(1λ)wnsubscript𝑤𝑛11𝜆subscript𝑤𝑛\displaystyle w_{n+1}-(1-\lambda)w_{n}italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - ( 1 - italic_λ ) italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT α(α+λ)δn(1λ)δn+1+αλ(hnhn1)+λ(hnhn+1)absent𝛼𝛼𝜆subscript𝛿𝑛1𝜆subscript𝛿𝑛1𝛼𝜆subscript𝑛subscript𝑛1𝜆subscript𝑛subscript𝑛1\displaystyle\leqslant\alpha(\alpha+\lambda)\delta_{n}-(1-\lambda)\delta_{n+1}% +\alpha\lambda(h_{n}-h_{n-1})+\lambda(h_{n}-h_{n+1})⩽ italic_α ( italic_α + italic_λ ) italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - ( 1 - italic_λ ) italic_δ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT + italic_α italic_λ ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) + italic_λ ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) (46)
λγn+1*αλγn*+2αλxn1xn1*,xn*xn1*𝜆superscriptsubscript𝛾𝑛1𝛼𝜆superscriptsubscript𝛾𝑛2𝛼𝜆subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\displaystyle-\lambda\gamma_{n+1}^{*}-\alpha\lambda\gamma_{n}^{*}+2\alpha% \lambda\langle x_{n-1}-x_{n-1}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle- italic_λ italic_γ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_α italic_λ italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + 2 italic_α italic_λ ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩
2λxn+1xn+1*,xn+1*xn*.2𝜆subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛\displaystyle-2\lambda\langle x_{n+1}-x_{n+1}^{*},x_{n+1}^{*}-x_{n}^{*}\rangle.- 2 italic_λ ⟨ italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ .

This inequality combined with (45) ensures that:

n+1(1λ)na1δn+a2(hnhn1)+a3hn+𝒳n*,subscript𝑛11𝜆subscript𝑛subscript𝑎1subscript𝛿𝑛subscript𝑎2subscript𝑛subscript𝑛1subscript𝑎3subscript𝑛superscriptsubscript𝒳𝑛\mathcal{E}_{n+1}-(1-\lambda)\mathcal{E}_{n}\leqslant a_{1}\delta_{n}+a_{2}(h_% {n}-h_{n-1})+a_{3}h_{n}+\mathcal{X}_{n}^{*},caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - ( 1 - italic_λ ) caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) + italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + caligraphic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , (47)

where:

a1=α(α+λ)(1λ)2,a2=αλλ(1λ)+(1λ)λ2,a3=λ3,formulae-sequencesubscript𝑎1𝛼𝛼𝜆superscript1𝜆2formulae-sequencesubscript𝑎2𝛼𝜆𝜆1𝜆1𝜆superscript𝜆2subscript𝑎3superscript𝜆3a_{1}=\alpha(\alpha+\lambda)-(1-\lambda)^{2},\leavevmode\nobreak\ a_{2}=\alpha% \lambda-\lambda(1-\lambda)+(1-\lambda)\lambda^{2},\leavevmode\nobreak\ a_{3}=% \lambda^{3},italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_α ( italic_α + italic_λ ) - ( 1 - italic_λ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_α italic_λ - italic_λ ( 1 - italic_λ ) + ( 1 - italic_λ ) italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = italic_λ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ,

and 𝒳n*superscriptsubscript𝒳𝑛\mathcal{X}_{n}^{*}caligraphic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is defined by

𝒳n*=superscriptsubscript𝒳𝑛absent\displaystyle\mathcal{X}_{n}^{*}=caligraphic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = λ(1λ+α)γn*+2αλxn1xn1*,xn*xn1*𝜆1𝜆𝛼superscriptsubscript𝛾𝑛2𝛼𝜆subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\displaystyle-\lambda(1-\lambda+\alpha)\gamma_{n}^{*}+2\alpha\lambda\langle x_% {n-1}-x_{n-1}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle- italic_λ ( 1 - italic_λ + italic_α ) italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + 2 italic_α italic_λ ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩
2λ(1λ)xnxn*,xn*xn1*.2𝜆1𝜆subscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\displaystyle-2\lambda(1-\lambda)\langle x_{n}-x_{n}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle.- 2 italic_λ ( 1 - italic_λ ) ⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ .

Due to the properties of the projection onto a convex set we have that for all n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N:

{xn1xn1*,xn*xn1*0,xnxn*,xn*xn1*0.\left\{\begin{gathered}\langle x_{n-1}-x_{n-1}^{*},x_{n}^{*}-x_{n-1}^{*}% \rangle\leqslant 0,\\ \langle x_{n}-x_{n}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle\geqslant 0.\end{gathered}\right.{ start_ROW start_CELL ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ ⩽ 0 , end_CELL end_ROW start_ROW start_CELL ⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ ⩾ 0 . end_CELL end_ROW

and since γn*0superscriptsubscript𝛾𝑛0\gamma_{n}^{*}\geqslant 0italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⩾ 0, we can conclude that 𝒳n*0superscriptsubscript𝒳𝑛0\mathcal{X}_{n}^{*}\leqslant 0caligraphic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⩽ 0 and consequently that:

n+1(1λ)na1δn+a2(hnhn1)+a3hn.subscript𝑛11𝜆subscript𝑛subscript𝑎1subscript𝛿𝑛subscript𝑎2subscript𝑛subscript𝑛1subscript𝑎3subscript𝑛\mathcal{E}_{n+1}-(1-\lambda)\mathcal{E}_{n}\leqslant a_{1}\delta_{n}+a_{2}(h_% {n}-h_{n-1})+a_{3}h_{n}.caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - ( 1 - italic_λ ) caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) + italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT . (48)

5.1.2 Getting the convergence rate

Recall that we want to find τ>0𝜏0\tau>0italic_τ > 0 such that: n+1(1τκ)n0subscript𝑛11𝜏𝜅subscript𝑛0\mathcal{E}_{n+1}-(1-\tau\sqrt{\kappa})\mathcal{E}_{n}\leqslant 0caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - ( 1 - italic_τ square-root start_ARG italic_κ end_ARG ) caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ 0. We choose the following set of parameters:

τ=233,λ=13κ,α=1533κ=153λ.formulae-sequence𝜏233formulae-sequence𝜆13𝜅𝛼1533𝜅153𝜆\tau=\frac{2}{3\sqrt{3}},\quad\lambda=\frac{1}{\sqrt{3}}\sqrt{\kappa},\quad% \alpha=1-\frac{5}{3\sqrt{3}}\sqrt{\kappa}=1-\frac{5}{3}\lambda.italic_τ = divide start_ARG 2 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG , italic_λ = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG , italic_α = 1 - divide start_ARG 5 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG = 1 - divide start_ARG 5 end_ARG start_ARG 3 end_ARG italic_λ .

Then we get that:

n+1(1τκ)nsubscript𝑛11𝜏𝜅subscript𝑛\displaystyle\mathcal{E}_{n+1}-(1-\tau\sqrt{\kappa})\mathcal{E}_{n}caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - ( 1 - italic_τ square-root start_ARG italic_κ end_ARG ) caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT =n+1(1λ)n+(23313)κnabsentsubscript𝑛11𝜆subscript𝑛23313𝜅subscript𝑛\displaystyle=\mathcal{E}_{n+1}-(1-\lambda)\mathcal{E}_{n}+\left(\frac{2}{3% \sqrt{3}}-\frac{1}{\sqrt{3}}\right)\sqrt{\kappa}\mathcal{E}_{n}= caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - ( 1 - italic_λ ) caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + ( divide start_ARG 2 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG - divide start_ARG 1 end_ARG start_ARG square-root start_ARG 3 end_ARG end_ARG ) square-root start_ARG italic_κ end_ARG caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (49)
a1δn+a2(hnhn1)+a3hn133κn,absentsubscript𝑎1subscript𝛿𝑛subscript𝑎2subscript𝑛subscript𝑛1subscript𝑎3subscript𝑛133𝜅subscript𝑛\displaystyle\leqslant a_{1}\delta_{n}+a_{2}(h_{n}-h_{n-1})+a_{3}h_{n}-\frac{1% }{3\sqrt{3}}\sqrt{\kappa}\mathcal{E}_{n},⩽ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) + italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ,

where for this parameter choice we have:

a1subscript𝑎1\displaystyle a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =\displaystyle== λ3(1λ3)=κ27(κ33),a3=λ3=κ3233,formulae-sequence𝜆31𝜆3𝜅27𝜅33subscript𝑎3superscript𝜆3superscript𝜅3233\displaystyle-\frac{\lambda}{3}\left(1-\frac{\lambda}{3}\right)=\frac{\sqrt{% \kappa}}{27}(\sqrt{\kappa}-3\sqrt{3}),\quad a_{3}=\lambda^{3}=\frac{\kappa^{% \frac{3}{2}}}{3\sqrt{3}},- divide start_ARG italic_λ end_ARG start_ARG 3 end_ARG ( 1 - divide start_ARG italic_λ end_ARG start_ARG 3 end_ARG ) = divide start_ARG square-root start_ARG italic_κ end_ARG end_ARG start_ARG 27 end_ARG ( square-root start_ARG italic_κ end_ARG - 3 square-root start_ARG 3 end_ARG ) , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = italic_λ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = divide start_ARG italic_κ start_POSTSUPERSCRIPT divide start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG ,
a2subscript𝑎2\displaystyle a_{2}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =\displaystyle== λ2(13λ)=κ33(13κ).superscript𝜆213𝜆𝜅3313𝜅\displaystyle\lambda^{2}\left(\frac{1}{3}-\lambda\right)=\frac{\kappa}{3\sqrt{% 3}}\left(\frac{1}{\sqrt{3}}-\sqrt{\kappa}\right).italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 3 end_ARG - italic_λ ) = divide start_ARG italic_κ end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG 3 end_ARG end_ARG - square-root start_ARG italic_κ end_ARG ) .

Under the condition κ13𝜅13\kappa\leqslant\frac{1}{3}italic_κ ⩽ divide start_ARG 1 end_ARG start_ARG 3 end_ARG, we have that a10subscript𝑎10a_{1}\leqslant 0italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⩽ 0 and hence,

n+1(1233κ)nκ33(13κ)(hnhn1)+κ3233hn133κn.subscript𝑛11233𝜅subscript𝑛𝜅3313𝜅subscript𝑛subscript𝑛1superscript𝜅3233subscript𝑛133𝜅subscript𝑛\mathcal{E}_{n+1}-\left(1-\frac{2}{3\sqrt{3}}\sqrt{\kappa}\right)\mathcal{E}_{% n}\leqslant\frac{\kappa}{3\sqrt{3}}\left(\frac{1}{\sqrt{3}}-\sqrt{\kappa}% \right)(h_{n}-h_{n-1})+\frac{\kappa^{\frac{3}{2}}}{3\sqrt{3}}h_{n}-\frac{1}{3% \sqrt{3}}\sqrt{\kappa}\mathcal{E}_{n}.caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - ( 1 - divide start_ARG 2 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG ) caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ divide start_ARG italic_κ end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG 3 end_ARG end_ARG - square-root start_ARG italic_κ end_ARG ) ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) + divide start_ARG italic_κ start_POSTSUPERSCRIPT divide start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT . (50)

Moreover, as the condition κ13𝜅13\kappa\leqslant\frac{1}{3}italic_κ ⩽ divide start_ARG 1 end_ARG start_ARG 3 end_ARG ensures that a20subscript𝑎20a_{2}\geqslant 0italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⩾ 0 we can apply the following lemma which is proved in Section A.5

Lemma 4.

Let (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be the sequence provided by (V-FISTA) and λ=13κ𝜆13𝜅\lambda=\frac{1}{\sqrt{3}}\sqrt{\kappa}italic_λ = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG. Then for all n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N:

hnhn13κ(nwn).subscript𝑛subscript𝑛13𝜅subscript𝑛subscript𝑤𝑛h_{n}-h_{n-1}\leqslant\frac{\sqrt{3}}{\sqrt{\kappa}}\left(\mathcal{E}_{n}-w_{n% }\right).italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ⩽ divide start_ARG square-root start_ARG 3 end_ARG end_ARG start_ARG square-root start_ARG italic_κ end_ARG end_ARG ( caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) . (51)

Lemma 51 guarantees that if κ13𝜅13\kappa\leqslant\frac{1}{3}italic_κ ⩽ divide start_ARG 1 end_ARG start_ARG 3 end_ARG:

n+1(1233κ)nκ3233hnκ3(13κ)wnκ3n.subscript𝑛11233𝜅subscript𝑛superscript𝜅3233subscript𝑛𝜅313𝜅subscript𝑤𝑛𝜅3subscript𝑛\mathcal{E}_{n+1}-(1-\frac{2}{3\sqrt{3}}\sqrt{\kappa})\mathcal{E}_{n}\leqslant% \frac{\kappa^{\frac{3}{2}}}{3\sqrt{3}}h_{n}-\frac{\sqrt{\kappa}}{3}\left(\frac% {1}{\sqrt{3}}-\sqrt{\kappa}\right)w_{n}-\frac{\kappa}{3}\mathcal{E}_{n}.caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - ( 1 - divide start_ARG 2 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG ) caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ divide start_ARG italic_κ start_POSTSUPERSCRIPT divide start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - divide start_ARG square-root start_ARG italic_κ end_ARG end_ARG start_ARG 3 end_ARG ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG 3 end_ARG end_ARG - square-root start_ARG italic_κ end_ARG ) italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - divide start_ARG italic_κ end_ARG start_ARG 3 end_ARG caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT . (52)

Moreover, as F𝐹Fitalic_F satisfies 𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT we can write that hnwnκsubscript𝑛subscript𝑤𝑛𝜅h_{n}\leqslant\frac{w_{n}}{\kappa}italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ divide start_ARG italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_κ end_ARG and consequently:

n+1(1233κ)nsubscript𝑛11233𝜅subscript𝑛\displaystyle\mathcal{E}_{n+1}-(1-\frac{2}{3\sqrt{3}}\sqrt{\kappa})\mathcal{E}% _{n}caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - ( 1 - divide start_ARG 2 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG ) caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (κ33κ3(13κ))wnκ3nabsent𝜅33𝜅313𝜅subscript𝑤𝑛𝜅3subscript𝑛\displaystyle\leqslant\left(\frac{\sqrt{\kappa}}{3\sqrt{3}}-\frac{\sqrt{\kappa% }}{3}\left(\frac{1}{\sqrt{3}}-\sqrt{\kappa}\right)\right)w_{n}-\frac{\kappa}{3% }\mathcal{E}_{n}⩽ ( divide start_ARG square-root start_ARG italic_κ end_ARG end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG - divide start_ARG square-root start_ARG italic_κ end_ARG end_ARG start_ARG 3 end_ARG ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG 3 end_ARG end_ARG - square-root start_ARG italic_κ end_ARG ) ) italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - divide start_ARG italic_κ end_ARG start_ARG 3 end_ARG caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (53)
κ3wnκ3n.absent𝜅3subscript𝑤𝑛𝜅3subscript𝑛\displaystyle\leqslant\frac{\kappa}{3}w_{n}-\frac{\kappa}{3}\mathcal{E}_{n}.⩽ divide start_ARG italic_κ end_ARG start_ARG 3 end_ARG italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - divide start_ARG italic_κ end_ARG start_ARG 3 end_ARG caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT .

Noticing that wnnsubscript𝑤𝑛subscript𝑛w_{n}\leqslant\mathcal{E}_{n}italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT we can conclude that:

n+1(1233κ)n0.subscript𝑛11233𝜅subscript𝑛0\mathcal{E}_{n+1}-(1-\frac{2}{3\sqrt{3}}\sqrt{\kappa})\mathcal{E}_{n}\leqslant 0.caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - ( 1 - divide start_ARG 2 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG ) caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ 0 . (54)

Hence: n(1233κ)n0.subscript𝑛superscript1233𝜅𝑛subscript0\mathcal{E}_{n}\leqslant\left(1-\frac{2}{3\sqrt{3}}\sqrt{\kappa}\right)^{n}% \mathcal{E}_{0}.caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ ( 1 - divide start_ARG 2 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . As we consider that x1=x0subscript𝑥1subscript𝑥0x_{-1}=x_{0}italic_x start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we have that 0=w0+λ2h0subscript0subscript𝑤0superscript𝜆2subscript0\mathcal{E}_{0}=w_{0}+\lambda^{2}h_{0}caligraphic_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. As a consequence, the geometry condition 𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT ensures that 043w0subscript043subscript𝑤0\mathcal{E}_{0}\leqslant\frac{4}{3}w_{0}caligraphic_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⩽ divide start_ARG 4 end_ARG start_ARG 3 end_ARG italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and

F(xn)F*43(1233κ)n(F(x0)F*).𝐹subscript𝑥𝑛superscript𝐹43superscript1233𝜅𝑛𝐹subscript𝑥0superscript𝐹F(x_{n})-F^{*}\leqslant\frac{4}{3}\left(1-\frac{2}{3\sqrt{3}}\sqrt{\kappa}% \right)^{n}(F(x_{0})-F^{*}).italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⩽ divide start_ARG 4 end_ARG start_ARG 3 end_ARG ( 1 - divide start_ARG 2 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_F ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) . (55)

5.1.3 Convergence of the trajectories

Let bn=xnxn1+λ(xn1xn1*)2subscript𝑏𝑛superscriptnormsubscript𝑥𝑛subscript𝑥𝑛1𝜆subscript𝑥𝑛1superscriptsubscript𝑥𝑛12b_{n}=\|x_{n}-x_{n-1}+\lambda(x_{n-1}-x_{n-1}^{*})\|^{2}italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT + italic_λ ( italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Using the inequality u2=2u+v2+2v2superscriptnorm𝑢22superscriptnorm𝑢𝑣22superscriptnorm𝑣2\|u\|^{2}=2\|u+v\|^{2}+2\|v\|^{2}∥ italic_u ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 2 ∥ italic_u + italic_v ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ∥ italic_v ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we get that:

δn2bn+2λ2hn1.subscript𝛿𝑛2subscript𝑏𝑛2superscript𝜆2subscript𝑛1\delta_{n}\leqslant 2b_{n}+\frac{2}{\lambda^{2}}h_{n-1}.italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ 2 italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + divide start_ARG 2 end_ARG start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT . (56)

Thus, using the definition of \mathcal{E}caligraphic_E and the geometry of F𝐹Fitalic_F:

δn2n+2λ2κwn1.subscript𝛿𝑛2subscript𝑛2superscript𝜆2𝜅subscript𝑤𝑛1\delta_{n}\leqslant 2\mathcal{E}_{n}+\frac{2}{\lambda^{2}\kappa}w_{n-1}.italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ 2 caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + divide start_ARG 2 end_ARG start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ end_ARG italic_w start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT . (57)

Then, by applying inequality (54), we deduce that:

δn(2(1233κ)+2λ2κ)n1,subscript𝛿𝑛21233𝜅2superscript𝜆2𝜅subscript𝑛1\delta_{n}\leqslant\left(2\left(1-\frac{2}{3\sqrt{3}}\sqrt{\kappa}\right)+% \frac{2}{\lambda^{2}\kappa}\right)\mathcal{E}_{n-1},italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ ( 2 ( 1 - divide start_ARG 2 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG ) + divide start_ARG 2 end_ARG start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ end_ARG ) caligraphic_E start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT , (58)

and consequently,

δn(2(1233κ)+2λ2κ)(1233κ)n10.subscript𝛿𝑛21233𝜅2superscript𝜆2𝜅superscript1233𝜅𝑛1subscript0\delta_{n}\leqslant\left(2\left(1-\frac{2}{3\sqrt{3}}\sqrt{\kappa}\right)+% \frac{2}{\lambda^{2}\kappa}\right)\left(1-\frac{2}{3\sqrt{3}}\sqrt{\kappa}% \right)^{n-1}\mathcal{E}_{0}.italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ ( 2 ( 1 - divide start_ARG 2 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG ) + divide start_ARG 2 end_ARG start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ end_ARG ) ( 1 - divide start_ARG 2 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG ) start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . (59)

Hence,

xnxn1=𝒪(e133κn).normsubscript𝑥𝑛subscript𝑥𝑛1𝒪superscript𝑒133𝜅𝑛\|x_{n}-x_{n-1}\|=\mathcal{O}\left(e^{-\frac{1}{3\sqrt{3}}\sqrt{\kappa}n}% \right).∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∥ = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 3 square-root start_ARG 3 end_ARG end_ARG square-root start_ARG italic_κ end_ARG italic_n end_POSTSUPERSCRIPT ) . (60)

5.2 Proof of Theorem 2

5.2.1 Structure of the proof

The proof of Theorem 2 is built around the approach provided in [8] in order to prove convergence rates of the trajectories of the Heavy Ball system described by:

x¨(t)+αcx˙(t)+F(x(t))=0,¨𝑥𝑡subscript𝛼𝑐˙𝑥𝑡𝐹𝑥𝑡0\ddot{x}(t)+\alpha_{c}\dot{x}(t)+\nabla F(x(t))=0,over¨ start_ARG italic_x end_ARG ( italic_t ) + italic_α start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT over˙ start_ARG italic_x end_ARG ( italic_t ) + ∇ italic_F ( italic_x ( italic_t ) ) = 0 , (HBF)

for some αc>0subscript𝛼𝑐0\alpha_{c}>0italic_α start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT > 0. Indeed, the sequence generated by (V-FISTA) can be seen as a discretization of (HBF) (when F𝐹Fitalic_F is differentiable) and the strategy can be adapted to the discrete setting. Note that the parameter αcsubscript𝛼𝑐\alpha_{c}italic_α start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT in (HBF) does not play the same role as α𝛼\alphaitalic_α in (V-FISTA). Indeed, we have that αcsubscript𝛼𝑐\alpha_{c}italic_α start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT behaves as L(1α)𝐿1𝛼\sqrt{L}(1-\alpha)square-root start_ARG italic_L end_ARG ( 1 - italic_α ).
This proof relies on the analysis of the following Lyapunov energy:

n,n=2L(F(xn)F*)+αxnxn1+λ(xnxn*)2+λ(1α)2xnxn*2.formulae-sequencefor-all𝑛subscript𝑛2𝐿𝐹subscript𝑥𝑛superscript𝐹𝛼superscriptnormsubscript𝑥𝑛subscript𝑥𝑛1𝜆subscript𝑥𝑛superscriptsubscript𝑥𝑛2𝜆superscript1𝛼2superscriptnormsubscript𝑥𝑛superscriptsubscript𝑥𝑛2\forall n\in\mathbb{N},\quad\mathcal{E}_{n}=\frac{2}{L}(F(x_{n})-F^{*})+\alpha% \|x_{n}-x_{n-1}+\lambda(x_{n}-x_{n}^{*})\|^{2}+\lambda(1-\alpha)^{2}\|x_{n}-x_% {n}^{*}\|^{2}.∀ italic_n ∈ blackboard_N , caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 2 end_ARG start_ARG italic_L end_ARG ( italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + italic_α ∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT + italic_λ ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (61)

The strategy of the proof is straightforward: we aim to find a set of parameters (α,λ,ν)(+)3𝛼𝜆𝜈superscriptsuperscript3(\alpha,\lambda,\nu)\in\left(\mathbb{R}^{+}\right)^{3}( italic_α , italic_λ , italic_ν ) ∈ ( blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT such that for any n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N,

n+1n+νn+10.subscript𝑛1subscript𝑛𝜈subscript𝑛10\mathcal{E}_{n+1}-\mathcal{E}_{n}+\nu\mathcal{E}_{n+1}\leqslant 0.caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_ν caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⩽ 0 . (62)

In this way, simple calculations show that it ensures

n,n(1ν+ν2)n0formulae-sequencefor-all𝑛subscript𝑛superscript1𝜈superscript𝜈2𝑛subscript0\forall n\in\mathbb{N},\quad\mathcal{E}_{n}\leqslant\left(1-\nu+\nu^{2}\right)% ^{n}\mathcal{E}_{0}∀ italic_n ∈ blackboard_N , caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ ( 1 - italic_ν + italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (63)

which leads us to the conclusion.

5.2.2 Proof

Let (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be the sequence provided by (V-FISTA). Following the notations introduced in (42), we can rewrite:

n,n=wn+λ(λα+(1α)2)hn+αδn+2αλxnxn*,xnxn1.formulae-sequencefor-all𝑛subscript𝑛subscript𝑤𝑛𝜆𝜆𝛼superscript1𝛼2subscript𝑛𝛼subscript𝛿𝑛2𝛼𝜆subscript𝑥𝑛superscriptsubscript𝑥𝑛subscript𝑥𝑛subscript𝑥𝑛1\forall n\in\mathbb{N},\leavevmode\nobreak\ \mathcal{E}_{n}=w_{n}+\lambda(% \lambda\alpha+(1-\alpha)^{2})h_{n}+\alpha\delta_{n}+2\alpha\lambda\langle x_{n% }-x_{n}^{*},x_{n}-x_{n-1}\rangle.∀ italic_n ∈ blackboard_N , caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_λ ( italic_λ italic_α + ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_α italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + 2 italic_α italic_λ ⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ⟩ . (64)

Following the first claim of 2,

n=subscript𝑛absent\displaystyle\mathcal{E}_{n}=caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = wn+λ(λα+(1α)2)hn+λα(hnhn1)+(1+λ)αδnsubscript𝑤𝑛𝜆𝜆𝛼superscript1𝛼2subscript𝑛𝜆𝛼subscript𝑛subscript𝑛11𝜆𝛼subscript𝛿𝑛\displaystyle w_{n}+\lambda(\lambda\alpha+(1-\alpha)^{2})h_{n}+\lambda\alpha(h% _{n}-h_{n-1})+(1+\lambda)\alpha\delta_{n}italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_λ ( italic_λ italic_α + ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_λ italic_α ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) + ( 1 + italic_λ ) italic_α italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (65)
λαγn*+2λαxn1xn1*,xn*xn1*.𝜆𝛼superscriptsubscript𝛾𝑛2𝜆𝛼subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\displaystyle-\lambda\alpha\gamma_{n}^{*}+2\lambda\alpha\langle x_{n-1}-x_{n-1% }^{*},x_{n}^{*}-x_{n-1}^{*}\rangle.- italic_λ italic_α italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + 2 italic_λ italic_α ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ .

Observe that due to the properties of the projection onto a convex, for any n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N:

nwn+λ(λα+(1α)2)hn+λα(hnhn1)+(1+λ)αδn.subscript𝑛subscript𝑤𝑛𝜆𝜆𝛼superscript1𝛼2subscript𝑛𝜆𝛼subscript𝑛subscript𝑛11𝜆𝛼subscript𝛿𝑛\mathcal{E}_{n}\leqslant w_{n}+\lambda(\lambda\alpha+(1-\alpha)^{2})h_{n}+% \lambda\alpha(h_{n}-h_{n-1})+(1+\lambda)\alpha\delta_{n}.caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_λ ( italic_λ italic_α + ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_λ italic_α ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) + ( 1 + italic_λ ) italic_α italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT . (66)

By exploiting the expression (65), we can show the following lemma. The proof can be found in Section A.6.

Lemma 5.

For any n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N, we have that:

n+1nsubscript𝑛1subscript𝑛absent\displaystyle\mathcal{E}_{n+1}-\mathcal{E}_{n}\leqslantcaligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ λwn+1+λα(λ+α1)(hn+1hn)𝜆subscript𝑤𝑛1𝜆𝛼𝜆𝛼1subscript𝑛1subscript𝑛\displaystyle\leavevmode\nobreak\ -\lambda w_{n+1}+\lambda\alpha(\lambda+% \alpha-1)(h_{n+1}-h_{n})- italic_λ italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT + italic_λ italic_α ( italic_λ + italic_α - 1 ) ( italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) (67)
+(λα+α1)δn+1α(1αλα)δn.𝜆𝛼𝛼1subscript𝛿𝑛1𝛼1𝛼𝜆𝛼subscript𝛿𝑛\displaystyle+(\lambda\alpha+\alpha-1)\delta_{n+1}-\alpha(1-\alpha-\lambda% \alpha)\delta_{n}.+ ( italic_λ italic_α + italic_α - 1 ) italic_δ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_α ( 1 - italic_α - italic_λ italic_α ) italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT .

This inequality combined to (66) guarantees that for any ν>0𝜈0\nu>0italic_ν > 0,

n+1n+νn+1subscript𝑛1subscript𝑛𝜈subscript𝑛1absent\displaystyle\mathcal{E}_{n+1}-\mathcal{E}_{n}+\nu\mathcal{E}_{n+1}\leqslantcaligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_ν caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⩽ (νλ)wn+1+λα(λ+α1+ν)(hn+1hn)𝜈𝜆subscript𝑤𝑛1𝜆𝛼𝜆𝛼1𝜈subscript𝑛1subscript𝑛\displaystyle\leavevmode\nobreak\ (\nu-\lambda)w_{n+1}+\lambda\alpha(\lambda+% \alpha-1+\nu)(h_{n+1}-h_{n})( italic_ν - italic_λ ) italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT + italic_λ italic_α ( italic_λ + italic_α - 1 + italic_ν ) ( italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) (68)
+((1+λ)α(1+ν)1)δn+1α(1αλα)δn1𝜆𝛼1𝜈1subscript𝛿𝑛1𝛼1𝛼𝜆𝛼subscript𝛿𝑛\displaystyle+((1+\lambda)\alpha(1+\nu)-1)\delta_{n+1}-\alpha(1-\alpha-\lambda% \alpha)\delta_{n}+ ( ( 1 + italic_λ ) italic_α ( 1 + italic_ν ) - 1 ) italic_δ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_α ( 1 - italic_α - italic_λ italic_α ) italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT
+νλ(λα+(1α)2)hn+1.𝜈𝜆𝜆𝛼superscript1𝛼2subscript𝑛1\displaystyle+\nu\lambda(\lambda\alpha+(1-\alpha)^{2})h_{n+1}.+ italic_ν italic_λ ( italic_λ italic_α + ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT .

We make the following choice of parameters:

α=1ωκ,ν=τκ,λ=1αν=(ωτ)κ.formulae-sequence𝛼1𝜔𝜅formulae-sequence𝜈𝜏𝜅𝜆1𝛼𝜈𝜔𝜏𝜅\alpha=1-\omega\sqrt{\kappa},\leavevmode\nobreak\ \nu=\tau\sqrt{\kappa},% \leavevmode\nobreak\ \lambda=1-\alpha-\nu=(\omega-\tau)\sqrt{\kappa}.italic_α = 1 - italic_ω square-root start_ARG italic_κ end_ARG , italic_ν = italic_τ square-root start_ARG italic_κ end_ARG , italic_λ = 1 - italic_α - italic_ν = ( italic_ω - italic_τ ) square-root start_ARG italic_κ end_ARG . (69)

This set of parameters ensures that the following inequality is valid:

n+1n+τκn+1subscript𝑛1subscript𝑛𝜏𝜅subscript𝑛1absent\displaystyle\mathcal{E}_{n+1}-\mathcal{E}_{n}+\tau\sqrt{\kappa}\mathcal{E}_{n% +1}\leqslantcaligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ square-root start_ARG italic_κ end_ARG caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⩽ (2τω)κwn+1(ωτ+(ωτ)2+ωτ(ωτ)κ)κδn+12𝜏𝜔𝜅subscript𝑤𝑛1𝜔𝜏superscript𝜔𝜏2𝜔𝜏𝜔𝜏𝜅𝜅subscript𝛿𝑛1\displaystyle\leavevmode\nobreak\ (2\tau-\omega)\sqrt{\kappa}w_{n+1}-(\omega% \tau+(\omega-\tau)^{2}+\omega\tau(\omega-\tau)\sqrt{\kappa})\kappa\delta_{n+1}( 2 italic_τ - italic_ω ) square-root start_ARG italic_κ end_ARG italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - ( italic_ω italic_τ + ( italic_ω - italic_τ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ω italic_τ ( italic_ω - italic_τ ) square-root start_ARG italic_κ end_ARG ) italic_κ italic_δ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT (70)
(1ωκ)(τ+(ωτ)κ)κδn1𝜔𝜅𝜏𝜔𝜏𝜅𝜅subscript𝛿𝑛\displaystyle-(1-\omega\sqrt{\kappa})(\tau+(\omega-\tau)\sqrt{\kappa})\sqrt{% \kappa}\delta_{n}- ( 1 - italic_ω square-root start_ARG italic_κ end_ARG ) ( italic_τ + ( italic_ω - italic_τ ) square-root start_ARG italic_κ end_ARG ) square-root start_ARG italic_κ end_ARG italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT
+τ(ωτ)(ωτ+ωτκ)κ32hn+1.𝜏𝜔𝜏𝜔𝜏𝜔𝜏𝜅superscript𝜅32subscript𝑛1\displaystyle+\tau(\omega-\tau)(\omega-\tau+\omega\tau\sqrt{\kappa})\kappa^{% \frac{3}{2}}h_{n+1}.+ italic_τ ( italic_ω - italic_τ ) ( italic_ω - italic_τ + italic_ω italic_τ square-root start_ARG italic_κ end_ARG ) italic_κ start_POSTSUPERSCRIPT divide start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT .

Note that we consider a parameter α>0𝛼0\alpha>0italic_α > 0 which implies that 1ωκ>01𝜔𝜅01-\omega\sqrt{\kappa}>01 - italic_ω square-root start_ARG italic_κ end_ARG > 0. Suppose in addition that ω>2τ𝜔2𝜏\omega>2\tauitalic_ω > 2 italic_τ. Then, we get that for any n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N,

n+1n+τκn+1(ω2τ)κwn+1+τ(ωτ)(ωτ+ωτκ)κ32hn+1.subscript𝑛1subscript𝑛𝜏𝜅subscript𝑛1𝜔2𝜏𝜅subscript𝑤𝑛1𝜏𝜔𝜏𝜔𝜏𝜔𝜏𝜅superscript𝜅32subscript𝑛1\mathcal{E}_{n+1}-\mathcal{E}_{n}+\tau\sqrt{\kappa}\mathcal{E}_{n+1}\leqslant-% (\omega-2\tau)\sqrt{\kappa}w_{n+1}+\tau(\omega-\tau)(\omega-\tau+\omega\tau% \sqrt{\kappa})\kappa^{\frac{3}{2}}h_{n+1}.caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ square-root start_ARG italic_κ end_ARG caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⩽ - ( italic_ω - 2 italic_τ ) square-root start_ARG italic_κ end_ARG italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT + italic_τ ( italic_ω - italic_τ ) ( italic_ω - italic_τ + italic_ω italic_τ square-root start_ARG italic_κ end_ARG ) italic_κ start_POSTSUPERSCRIPT divide start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT . (71)

As F𝐹Fitalic_F satisfies the assumption 𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT, we can write that κhn+1wn+1𝜅subscript𝑛1subscript𝑤𝑛1\kappa h_{n+1}\leqslant w_{n+1}italic_κ italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⩽ italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT and hence:

n+1n+τκn+1(τ(ωτ)(ωτ+ωτκ)ω+2τ)κ32hn+1.subscript𝑛1subscript𝑛𝜏𝜅subscript𝑛1𝜏𝜔𝜏𝜔𝜏𝜔𝜏𝜅𝜔2𝜏superscript𝜅32subscript𝑛1\mathcal{E}_{n+1}-\mathcal{E}_{n}+\tau\sqrt{\kappa}\mathcal{E}_{n+1}\leqslant% \left(\tau(\omega-\tau)(\omega-\tau+\omega\tau\sqrt{\kappa})-\omega+2\tau% \right)\kappa^{\frac{3}{2}}h_{n+1}.caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ square-root start_ARG italic_κ end_ARG caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⩽ ( italic_τ ( italic_ω - italic_τ ) ( italic_ω - italic_τ + italic_ω italic_τ square-root start_ARG italic_κ end_ARG ) - italic_ω + 2 italic_τ ) italic_κ start_POSTSUPERSCRIPT divide start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT . (72)

Thus, if

(1ωκ)τ3ω(2ωκ)τ2+(ω2+2)τω0,1𝜔𝜅superscript𝜏3𝜔2𝜔𝜅superscript𝜏2superscript𝜔22𝜏𝜔0\left(1-\omega\sqrt{\kappa}\right)\tau^{3}-\omega\left(2-\omega\sqrt{\kappa}% \right)\tau^{2}+(\omega^{2}+2)\tau-\omega\leqslant 0,( 1 - italic_ω square-root start_ARG italic_κ end_ARG ) italic_τ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - italic_ω ( 2 - italic_ω square-root start_ARG italic_κ end_ARG ) italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ) italic_τ - italic_ω ⩽ 0 , (73)

then for any n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N,

n+1n+τκn+10.subscript𝑛1subscript𝑛𝜏𝜅subscript𝑛10\mathcal{E}_{n+1}-\mathcal{E}_{n}+\tau\sqrt{\kappa}\mathcal{E}_{n+1}\leqslant 0.caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_τ square-root start_ARG italic_κ end_ARG caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⩽ 0 . (74)

Note that the solutions of (73) automatically satisfy ω>2τ𝜔2𝜏\omega>2\tauitalic_ω > 2 italic_τ. Elementary computations show that this implies

n(1τκ+τ2κ)n0.subscript𝑛superscript1𝜏𝜅superscript𝜏2𝜅𝑛subscript0\mathcal{E}_{n}\leqslant\left(1-\tau\sqrt{\kappa}+\tau^{2}\kappa\right)^{n}% \mathcal{E}_{0}.caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ ( 1 - italic_τ square-root start_ARG italic_κ end_ARG + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT caligraphic_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . (75)

Note that since x1=x0subscript𝑥1subscript𝑥0x_{-1}=x_{0}italic_x start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT,

0=w0+λ(λα+(1α)2)h0=w0+((ωτ)2+(ωτ)ωτκ)κh0,subscript0subscript𝑤0𝜆𝜆𝛼superscript1𝛼2subscript0subscript𝑤0superscript𝜔𝜏2𝜔𝜏𝜔𝜏𝜅𝜅subscript0\mathcal{E}_{0}=w_{0}+\lambda\left(\lambda\alpha+(1-\alpha)^{2}\right)h_{0}=w_% {0}+\left((\omega-\tau)^{2}+(\omega-\tau)\omega\tau\sqrt{\kappa}\right)\kappa h% _{0},caligraphic_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_λ ( italic_λ italic_α + ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( ( italic_ω - italic_τ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_ω - italic_τ ) italic_ω italic_τ square-root start_ARG italic_κ end_ARG ) italic_κ italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , (76)

and using the assumption 𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT,

0(1+(ωτ)2+(ωτ)ωτκ)w0.subscript01superscript𝜔𝜏2𝜔𝜏𝜔𝜏𝜅subscript𝑤0\mathcal{E}_{0}\leqslant\left(1+(\omega-\tau)^{2}+(\omega-\tau)\omega\tau\sqrt% {\kappa}\right)w_{0}.caligraphic_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⩽ ( 1 + ( italic_ω - italic_τ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_ω - italic_τ ) italic_ω italic_τ square-root start_ARG italic_κ end_ARG ) italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . (77)

Moreover, for any n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N, wnnsubscript𝑤𝑛subscript𝑛w_{n}\leqslant\mathcal{E}_{n}italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Thus, if (73) is satisfied, then for any n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N:

F(xn)F*(1+(ωτ)2+(ωτ)ωτκ)(1τκ+τ2κ)n(F(x0)F*).𝐹subscript𝑥𝑛superscript𝐹1superscript𝜔𝜏2𝜔𝜏𝜔𝜏𝜅superscript1𝜏𝜅superscript𝜏2𝜅𝑛𝐹subscript𝑥0superscript𝐹F(x_{n})-F^{*}\leqslant\left(1+(\omega-\tau)^{2}+(\omega-\tau)\omega\tau\sqrt{% \kappa}\right)\left(1-\tau\sqrt{\kappa}+\tau^{2}\kappa\right)^{n}(F(x_{0})-F^{% *}).italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⩽ ( 1 + ( italic_ω - italic_τ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_ω - italic_τ ) italic_ω italic_τ square-root start_ARG italic_κ end_ARG ) ( 1 - italic_τ square-root start_ARG italic_κ end_ARG + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_F ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) . (78)

In addition, by applying the inequality u2=2u+v2+2v2superscriptnorm𝑢22superscriptnorm𝑢𝑣22superscriptnorm𝑣2\|u\|^{2}=2\|u+v\|^{2}+2\|v\|^{2}∥ italic_u ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 2 ∥ italic_u + italic_v ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ∥ italic_v ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we get that:

δn2bn+2λ2hn,subscript𝛿𝑛2subscript𝑏𝑛2superscript𝜆2subscript𝑛\delta_{n}\leqslant 2b_{n}+\frac{2}{\lambda^{2}}h_{n},italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ 2 italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + divide start_ARG 2 end_ARG start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , (79)

where bn=λ(xnxn*)+xnxn12subscript𝑏𝑛superscriptnorm𝜆subscript𝑥𝑛superscriptsubscript𝑥𝑛subscript𝑥𝑛subscript𝑥𝑛12b_{n}=\|\lambda(x_{n}-x_{n}^{*})+x_{n}-x_{n-1}\|^{2}italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∥ italic_λ ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The assumption 𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT gives that

δn2bn+2λ2κwn,subscript𝛿𝑛2subscript𝑏𝑛2superscript𝜆2𝜅subscript𝑤𝑛\delta_{n}\leqslant 2b_{n}+\frac{2}{\lambda^{2}\kappa}w_{n},italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ 2 italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + divide start_ARG 2 end_ARG start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ end_ARG italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , (80)

and given the definition of nsubscript𝑛\mathcal{E}_{n}caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT we obtain,

δn(2α+2λ2κ)n.subscript𝛿𝑛2𝛼2superscript𝜆2𝜅subscript𝑛\delta_{n}\leqslant\left(\frac{2}{\alpha}+\frac{2}{\lambda^{2}\kappa}\right)% \mathcal{E}_{n}.italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ ( divide start_ARG 2 end_ARG start_ARG italic_α end_ARG + divide start_ARG 2 end_ARG start_ARG italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ end_ARG ) caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT . (81)

By combining the above inequality with (75), we can finally prove that if (73) is valid, then

xnxn1=𝒪(e12τκ(1τκ)n).normsubscript𝑥𝑛subscript𝑥𝑛1𝒪superscript𝑒12𝜏𝜅1𝜏𝜅𝑛\|x_{n}-x_{n-1}\|=\mathcal{O}\left(e^{-\frac{1}{2}\tau\sqrt{\kappa}\left(1-% \tau\sqrt{\kappa}\right)n}\right).∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ∥ = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_τ square-root start_ARG italic_κ end_ARG ( 1 - italic_τ square-root start_ARG italic_κ end_ARG ) italic_n end_POSTSUPERSCRIPT ) . (82)

5.3 Proof of Corollary 2

Let (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be the sequence given by (V-FISTA) for some α>0𝛼0\alpha>0italic_α > 0 and s=1L𝑠1𝐿s=\frac{1}{L}italic_s = divide start_ARG 1 end_ARG start_ARG italic_L end_ARG. According to Theorem 2, if α=1ωκ𝛼1𝜔𝜅\alpha=1-\omega\sqrt{\kappa}italic_α = 1 - italic_ω square-root start_ARG italic_κ end_ARG for some ω(0,1κ)𝜔01𝜅\omega\in\left(0,\frac{1}{\sqrt{\kappa}}\right)italic_ω ∈ ( 0 , divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_κ end_ARG end_ARG ), then (19) and (20) are valid for any τ>0𝜏0\tau>0italic_τ > 0 satisfying:

(1ωκ)τ3ω(2ωκ)τ2+(ω2+2)τω0.1𝜔𝜅superscript𝜏3𝜔2𝜔𝜅superscript𝜏2superscript𝜔22𝜏𝜔0\left(1-\omega\sqrt{\kappa}\right)\tau^{3}-\omega\left(2-\omega\sqrt{\kappa}% \right)\tau^{2}+(\omega^{2}+2)\tau-\omega\leqslant 0.( 1 - italic_ω square-root start_ARG italic_κ end_ARG ) italic_τ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - italic_ω ( 2 - italic_ω square-root start_ARG italic_κ end_ARG ) italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ) italic_τ - italic_ω ⩽ 0 . (83)

Corollary 2 relies on the following lemma.

Lemma 6.

Let

P:(τ,ω,κ)(1ωκ)τ3ω(2ωκ)τ2+(ω2+2)τω.:𝑃maps-to𝜏𝜔𝜅1𝜔𝜅superscript𝜏3𝜔2𝜔𝜅superscript𝜏2superscript𝜔22𝜏𝜔P:(\tau,\omega,\kappa)\mapsto\left(1-\omega\sqrt{\kappa}\right)\tau^{3}-\omega% \left(2-\omega\sqrt{\kappa}\right)\tau^{2}+(\omega^{2}+2)\tau-\omega.italic_P : ( italic_τ , italic_ω , italic_κ ) ↦ ( 1 - italic_ω square-root start_ARG italic_κ end_ARG ) italic_τ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - italic_ω ( 2 - italic_ω square-root start_ARG italic_κ end_ARG ) italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ) italic_τ - italic_ω .

If κ110𝜅110\kappa\leqslant\frac{1}{10}italic_κ ⩽ divide start_ARG 1 end_ARG start_ARG 10 end_ARG, then for any ω32𝜔32\omega\geqslant\frac{3}{2}italic_ω ⩾ divide start_ARG 3 end_ARG start_ARG 2 end_ARG, P(23ω,ω,κ)<0𝑃23𝜔𝜔𝜅0P\left(\frac{2}{3\omega},\omega,\kappa\right)<0italic_P ( divide start_ARG 2 end_ARG start_ARG 3 italic_ω end_ARG , italic_ω , italic_κ ) < 0.

Proof. Given the expression of P𝑃Pitalic_P, simple computations give that for any κ(0,1)𝜅01\kappa\in(0,1)italic_κ ∈ ( 0 , 1 ) and ω>0𝜔0\omega>0italic_ω > 0,

P(23ω,ω,κ)=(1ωκ)812ω227ω3+83ω29ω.𝑃23𝜔𝜔𝜅1𝜔𝜅812superscript𝜔227superscript𝜔383superscript𝜔29𝜔P\left(\frac{2}{3\omega},\omega,\kappa\right)=\left(1-\omega\sqrt{\kappa}% \right)\frac{8-12\omega^{2}}{27\omega^{3}}+\frac{8-3\omega^{2}}{9\omega}.italic_P ( divide start_ARG 2 end_ARG start_ARG 3 italic_ω end_ARG , italic_ω , italic_κ ) = ( 1 - italic_ω square-root start_ARG italic_κ end_ARG ) divide start_ARG 8 - 12 italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 27 italic_ω start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 8 - 3 italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 9 italic_ω end_ARG . (84)

We define the function ΦΦ\Phiroman_Φ as follows

Φ(ω;κ)=9ω4+12ω3κ+12ω28ωκ+8,Φ𝜔𝜅9superscript𝜔412superscript𝜔3𝜅12superscript𝜔28𝜔𝜅8\Phi(\omega;\kappa)=-9\omega^{4}+12\omega^{3}\sqrt{\kappa}+12\omega^{2}-8% \omega\sqrt{\kappa}+8,roman_Φ ( italic_ω ; italic_κ ) = - 9 italic_ω start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT + 12 italic_ω start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT square-root start_ARG italic_κ end_ARG + 12 italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 8 italic_ω square-root start_ARG italic_κ end_ARG + 8 ,

such that Φ(ω;κ)27ω3=P(23ω,ω,κ)Φ𝜔𝜅27superscript𝜔3𝑃23𝜔𝜔𝜅\frac{\Phi(\omega;\kappa)}{27\omega^{3}}=P\left(\frac{2}{3\omega},\omega,% \kappa\right)divide start_ARG roman_Φ ( italic_ω ; italic_κ ) end_ARG start_ARG 27 italic_ω start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG = italic_P ( divide start_ARG 2 end_ARG start_ARG 3 italic_ω end_ARG , italic_ω , italic_κ ). We get that:

Φω(ω;κ)=36ω2(ωκ)+24ω8κ.Φ𝜔𝜔𝜅36superscript𝜔2𝜔𝜅24𝜔8𝜅\frac{\partial\Phi}{\partial\omega}(\omega;\kappa)=-36\omega^{2}(\omega-\sqrt{% \kappa})+24\omega-8\sqrt{\kappa}.divide start_ARG ∂ roman_Φ end_ARG start_ARG ∂ italic_ω end_ARG ( italic_ω ; italic_κ ) = - 36 italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ω - square-root start_ARG italic_κ end_ARG ) + 24 italic_ω - 8 square-root start_ARG italic_κ end_ARG . (85)

Consequently, if ω32𝜔32\omega\geqslant\frac{3}{2}italic_ω ⩾ divide start_ARG 3 end_ARG start_ARG 2 end_ARG and κ110𝜅110\kappa\leqslant\frac{1}{10}italic_κ ⩽ divide start_ARG 1 end_ARG start_ARG 10 end_ARG, we have that ωκ>1𝜔𝜅1\omega-\sqrt{\kappa}>1italic_ω - square-root start_ARG italic_κ end_ARG > 1 and

Φω(ω;κ)36ω2+24ω<0.Φ𝜔𝜔𝜅36superscript𝜔224𝜔0\frac{\partial\Phi}{\partial\omega}(\omega;\kappa)\leqslant-36\omega^{2}+24% \omega<0.divide start_ARG ∂ roman_Φ end_ARG start_ARG ∂ italic_ω end_ARG ( italic_ω ; italic_κ ) ⩽ - 36 italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 24 italic_ω < 0 .

Since Φ(32;κ)=16916+572κΦ32𝜅16916572𝜅\Phi\left(\frac{3}{2};\kappa\right)=-\frac{169}{16}+\frac{57}{2}\sqrt{\kappa}roman_Φ ( divide start_ARG 3 end_ARG start_ARG 2 end_ARG ; italic_κ ) = - divide start_ARG 169 end_ARG start_ARG 16 end_ARG + divide start_ARG 57 end_ARG start_ARG 2 end_ARG square-root start_ARG italic_κ end_ARG which is strictly negative if κ110𝜅110\kappa\leqslant\frac{1}{10}italic_κ ⩽ divide start_ARG 1 end_ARG start_ARG 10 end_ARG, we can deduce that Φ(ω;κ)<0Φ𝜔𝜅0\Phi(\omega;\kappa)<0roman_Φ ( italic_ω ; italic_κ ) < 0 for any ω32𝜔32\omega\geqslant\frac{3}{2}italic_ω ⩾ divide start_ARG 3 end_ARG start_ARG 2 end_ARG. As Φ(ω;κ)27ω3=P(23ω,ω,κ)Φ𝜔𝜅27superscript𝜔3𝑃23𝜔𝜔𝜅\frac{\Phi(\omega;\kappa)}{27\omega^{3}}=P\left(\frac{2}{3\omega},\omega,% \kappa\right)divide start_ARG roman_Φ ( italic_ω ; italic_κ ) end_ARG start_ARG 27 italic_ω start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG = italic_P ( divide start_ARG 2 end_ARG start_ARG 3 italic_ω end_ARG , italic_ω , italic_κ ), this lemma is proved.
In fact, ω23ωmaps-to𝜔23𝜔\omega\mapsto\frac{2}{3\omega}italic_ω ↦ divide start_ARG 2 end_ARG start_ARG 3 italic_ω end_ARG is a lower bound of the highest value of τ𝜏\tauitalic_τ satisfying (22) for some ω32𝜔32\omega\geqslant\frac{3}{2}italic_ω ⩾ divide start_ARG 3 end_ARG start_ARG 2 end_ARG and κ=110𝜅110\kappa=\frac{1}{10}italic_κ = divide start_ARG 1 end_ARG start_ARG 10 end_ARG as illustrated in Figure 3.

Refer to caption
Figure 3: Comparison of the highest rate τ𝜏\tauitalic_τ satisfying (22) for κ=110𝜅110\kappa=\frac{1}{10}italic_κ = divide start_ARG 1 end_ARG start_ARG 10 end_ARG and ω>0𝜔0\omega>0italic_ω > 0 with the function ω23ωmaps-to𝜔23𝜔\omega\mapsto\frac{2}{3\omega}italic_ω ↦ divide start_ARG 2 end_ARG start_ARG 3 italic_ω end_ARG.

According to Lemma 6, if κ110𝜅110\kappa\leqslant\frac{1}{10}italic_κ ⩽ divide start_ARG 1 end_ARG start_ARG 10 end_ARG and ω(32,1κ)𝜔321𝜅\omega\in\left(\frac{3}{2},\frac{1}{\sqrt{\kappa}}\right)italic_ω ∈ ( divide start_ARG 3 end_ARG start_ARG 2 end_ARG , divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_κ end_ARG end_ARG ), then (19) and (20) are valid with τ=23ω𝜏23𝜔\tau=\frac{2}{3\omega}italic_τ = divide start_ARG 2 end_ARG start_ARG 3 italic_ω end_ARG. Moreover, if we consider α=1θ𝛼1𝜃\alpha=1-\thetaitalic_α = 1 - italic_θ for some θ[32κ,1)𝜃32𝜅1\theta\in\left[\frac{3}{2}\sqrt{\kappa},1\right)italic_θ ∈ [ divide start_ARG 3 end_ARG start_ARG 2 end_ARG square-root start_ARG italic_κ end_ARG , 1 ), then (19) and (20) are satisfied with τ=23θκ𝜏23𝜃𝜅\tau=\frac{2}{3\theta}\sqrt{\kappa}italic_τ = divide start_ARG 2 end_ARG start_ARG 3 italic_θ end_ARG square-root start_ARG italic_κ end_ARG. This leads to the conclusion of Corollary 2.

Appendix A Technical proofs

A.1 Proof of Theorem 3

Our analysis follows that introduced in [8]. We set α=(222)μ𝛼222𝜇\alpha=\left(2-\frac{\sqrt{2}}{2}\right)\sqrt{\mu}italic_α = ( 2 - divide start_ARG square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG ) square-root start_ARG italic_μ end_ARG and we consider the following Lyapunov energy:

(t)=F(x(t))F*+12λ(x(t)x*(t))+x˙(t)2+ξx(t)x*(t)2,𝑡𝐹𝑥𝑡superscript𝐹12superscriptnorm𝜆𝑥𝑡superscript𝑥𝑡˙𝑥𝑡2𝜉superscriptnorm𝑥𝑡superscript𝑥𝑡2\mathcal{E}(t)=F(x(t))-F^{*}+\frac{1}{2}\|\lambda(x(t)-x^{*}(t))+\dot{x}(t)\|^% {2}+\xi\|x(t)-x^{*}(t)\|^{2},caligraphic_E ( italic_t ) = italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_λ ( italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ) + over˙ start_ARG italic_x end_ARG ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ξ ∥ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (86)

with λ=μ𝜆𝜇\lambda=\sqrt{\mu}italic_λ = square-root start_ARG italic_μ end_ARG and ξ=(122)μ𝜉122𝜇\xi=-\left(1-\frac{\sqrt{2}}{2}\right)\muitalic_ξ = - ( 1 - divide start_ARG square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG ) italic_μ.

Following the discussion of Section 4.2.2, the assumptions of Theorem 3 ensure that \mathcal{E}caligraphic_E is right-differentiable and for all tt0𝑡subscript𝑡0t\geqslant t_{0}italic_t ⩾ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT,

˙(t)λF(x(t)),x(t)x*(t)+(λα)x˙(t)2+(ξ+λ(λα))x(t)x*(t),x˙(t),˙𝑡𝜆𝐹𝑥𝑡𝑥𝑡superscript𝑥𝑡𝜆𝛼superscriptnorm˙𝑥𝑡2𝜉𝜆𝜆𝛼𝑥𝑡superscript𝑥𝑡˙𝑥𝑡\dot{\mathcal{E}}(t)\leqslant-\lambda\langle\nabla F(x(t)),x(t)-x^{*}(t)% \rangle+(\lambda-\alpha)\|\dot{x}(t)\|^{2}+(\xi+\lambda(\lambda-\alpha))% \langle x(t)-x^{*}(t),\dot{x}(t)\rangle,over˙ start_ARG caligraphic_E end_ARG ( italic_t ) ⩽ - italic_λ ⟨ ∇ italic_F ( italic_x ( italic_t ) ) , italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ⟩ + ( italic_λ - italic_α ) ∥ over˙ start_ARG italic_x end_ARG ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_ξ + italic_λ ( italic_λ - italic_α ) ) ⟨ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) , over˙ start_ARG italic_x end_ARG ( italic_t ) ⟩ , (87)

where ˙˙\dot{\mathcal{E}}over˙ start_ARG caligraphic_E end_ARG denotes the right derivative of \mathcal{E}caligraphic_E. By using the convexity of F𝐹Fitalic_F and replacing the parameters by their value,

˙(t)μ(F(x(t))F*)(122)μx˙(t)2(22)μx(t)x*(t),x˙(t).˙𝑡𝜇𝐹𝑥𝑡superscript𝐹122𝜇superscriptnorm˙𝑥𝑡222𝜇𝑥𝑡superscript𝑥𝑡˙𝑥𝑡\dot{\mathcal{E}}(t)\leqslant-\sqrt{\mu}(F(x(t))-F^{*})-\left(1-\frac{\sqrt{2}% }{2}\right)\sqrt{\mu}\|\dot{x}(t)\|^{2}-(2-\sqrt{2})\mu\langle x(t)-x^{*}(t),% \dot{x}(t)\rangle.over˙ start_ARG caligraphic_E end_ARG ( italic_t ) ⩽ - square-root start_ARG italic_μ end_ARG ( italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - ( 1 - divide start_ARG square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG ) square-root start_ARG italic_μ end_ARG ∥ over˙ start_ARG italic_x end_ARG ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ( 2 - square-root start_ARG 2 end_ARG ) italic_μ ⟨ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) , over˙ start_ARG italic_x end_ARG ( italic_t ) ⟩ . (88)

Let us define δ=(22)μ𝛿22𝜇\delta=(2-\sqrt{2})\sqrt{\mu}italic_δ = ( 2 - square-root start_ARG 2 end_ARG ) square-root start_ARG italic_μ end_ARG. The above inequality guarantees that for all tt0𝑡subscript𝑡0t\geqslant t_{0}italic_t ⩾ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT:

˙(t)+δ(t)(12)μ(F(x(t))F*)+212μ32x(t)x*(t)2.˙𝑡𝛿𝑡12𝜇𝐹𝑥𝑡superscript𝐹212superscript𝜇32superscriptnorm𝑥𝑡superscript𝑥𝑡2\dot{\mathcal{E}}(t)+\delta\mathcal{E}(t)\leqslant\left(1-\sqrt{2}\right)\sqrt% {\mu}\left(F(x(t))-F^{*}\right)+\frac{\sqrt{2}-1}{2}\mu^{\frac{3}{2}}\|x(t)-x^% {*}(t)\|^{2}.over˙ start_ARG caligraphic_E end_ARG ( italic_t ) + italic_δ caligraphic_E ( italic_t ) ⩽ ( 1 - square-root start_ARG 2 end_ARG ) square-root start_ARG italic_μ end_ARG ( italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + divide start_ARG square-root start_ARG 2 end_ARG - 1 end_ARG start_ARG 2 end_ARG italic_μ start_POSTSUPERSCRIPT divide start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ∥ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (89)

As F𝐹Fitalic_F satisfies 𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT and x(t)x*(t)=d(x(t),X*)norm𝑥𝑡superscript𝑥𝑡𝑑𝑥𝑡superscript𝑋\|x(t)-x^{*}(t)\|=d(x(t),X^{*})∥ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ∥ = italic_d ( italic_x ( italic_t ) , italic_X start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) for all tt0𝑡subscript𝑡0t\geqslant t_{0}italic_t ⩾ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we obtain that for all tt0𝑡subscript𝑡0t\geqslant t_{0}italic_t ⩾ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT:

˙(t)+δ(t)((12)μμ2+212μ32)x(t)x*(t)20.˙𝑡𝛿𝑡12𝜇𝜇2212superscript𝜇32superscriptnorm𝑥𝑡superscript𝑥𝑡20\dot{\mathcal{E}}(t)+\delta\mathcal{E}(t)\leqslant\left(\left(1-\sqrt{2}\right% )\sqrt{\mu}\frac{\mu}{2}+\frac{\sqrt{2}-1}{2}\mu^{\frac{3}{2}}\right)\|x(t)-x^% {*}(t)\|^{2}\leqslant 0.over˙ start_ARG caligraphic_E end_ARG ( italic_t ) + italic_δ caligraphic_E ( italic_t ) ⩽ ( ( 1 - square-root start_ARG 2 end_ARG ) square-root start_ARG italic_μ end_ARG divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG + divide start_ARG square-root start_ARG 2 end_ARG - 1 end_ARG start_ARG 2 end_ARG italic_μ start_POSTSUPERSCRIPT divide start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) ∥ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ 0 . (90)

We refer the reader to the proof of [8, Theorem 1] for further developments on each of the above steps and a discussion on the value of the parameters. Lemma 39 then guarantees that, for all tt0𝑡subscript𝑡0t\geqslant t_{0}italic_t ⩾ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT,

(t)(t0)e(22)μ(tt0).𝑡subscript𝑡0superscript𝑒22𝜇𝑡subscript𝑡0\mathcal{E}(t)\leqslant\mathcal{E}(t_{0})e^{-(2-\sqrt{2})\sqrt{\mu}(t-t_{0})}.caligraphic_E ( italic_t ) ⩽ caligraphic_E ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) italic_e start_POSTSUPERSCRIPT - ( 2 - square-root start_ARG 2 end_ARG ) square-root start_ARG italic_μ end_ARG ( italic_t - italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT . (91)

Since F𝐹Fitalic_F satisfies 𝒢μ2subscriptsuperscript𝒢2𝜇\mathcal{G}^{2}_{\mu}caligraphic_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT, elementary computations show that:

F(x(t))F*+ξx(t)x*(t)2(21)(F(x(t))F*),𝐹𝑥𝑡superscript𝐹𝜉superscriptnorm𝑥𝑡superscript𝑥𝑡221𝐹𝑥𝑡superscript𝐹F(x(t))-F^{*}+\xi\|x(t)-x^{*}(t)\|^{2}\geqslant(\sqrt{2}-1)(F(x(t))-F^{*}),italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + italic_ξ ∥ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩾ ( square-root start_ARG 2 end_ARG - 1 ) ( italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) , (92)

and consequently,

tt0,(t)(21)(F(x(t))F*)+12λ(x(t)x*(t))+x˙(t)2.formulae-sequencefor-all𝑡subscript𝑡0𝑡21𝐹𝑥𝑡superscript𝐹12superscriptnorm𝜆𝑥𝑡superscript𝑥𝑡˙𝑥𝑡2\forall t\geqslant t_{0},\leavevmode\nobreak\ \mathcal{E}(t)\geqslant\left(% \sqrt{2}-1\right)(F(x(t))-F^{*})+\frac{1}{2}\|\lambda(x(t)-x^{*}(t))+\dot{x}(t% )\|^{2}.∀ italic_t ⩾ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , caligraphic_E ( italic_t ) ⩾ ( square-root start_ARG 2 end_ARG - 1 ) ( italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_λ ( italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ) + over˙ start_ARG italic_x end_ARG ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (93)

This inequality implies that for all tt0𝑡subscript𝑡0t\geqslant t_{0}italic_t ⩾ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT:

F(x(t))F*121(t),𝐹𝑥𝑡superscript𝐹121𝑡F(x(t))-F^{*}\leqslant\frac{1}{\sqrt{2}-1}\mathcal{E}(t),italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⩽ divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 end_ARG - 1 end_ARG caligraphic_E ( italic_t ) , (94)

and

λ(x(t)x*(t))+x˙(t)22(t).superscriptnorm𝜆𝑥𝑡superscript𝑥𝑡˙𝑥𝑡22𝑡\|\lambda(x(t)-x^{*}(t))+\dot{x}(t)\|^{2}\leqslant 2\mathcal{E}(t).∥ italic_λ ( italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ) + over˙ start_ARG italic_x end_ARG ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ 2 caligraphic_E ( italic_t ) . (95)

The first statement of Theorem 3 can be demonstrated by combining (91) and (94) and rewriting (t0)subscript𝑡0\mathcal{E}(t_{0})caligraphic_E ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) (see [8, Section 6.1] for further details). We prove the second result as follows.
Using inequality u22u+v2+2v2superscriptnorm𝑢22superscriptnorm𝑢𝑣22superscriptnorm𝑣2\|u\|^{2}\leqslant 2\|u+v\|^{2}+2\|v\|^{2}∥ italic_u ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ 2 ∥ italic_u + italic_v ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ∥ italic_v ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we get that:

x˙(t)2superscriptnorm˙𝑥𝑡2\displaystyle\|\dot{x}(t)\|^{2}∥ over˙ start_ARG italic_x end_ARG ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 2λ(x(t)x*(t))+x˙(t)2+2μx(t)x*(t)2absent2superscriptnorm𝜆𝑥𝑡superscript𝑥𝑡˙𝑥𝑡22𝜇superscriptnorm𝑥𝑡superscript𝑥𝑡2\displaystyle\leqslant 2\|\lambda(x(t)-x^{*}(t))+\dot{x}(t)\|^{2}+2\mu\|x(t)-x% ^{*}(t)\|^{2}⩽ 2 ∥ italic_λ ( italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ) + over˙ start_ARG italic_x end_ARG ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_μ ∥ italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (96)
2λ(x(t)x*(t))+x˙(t)2+4(F(x(t))F*).absent2superscriptnorm𝜆𝑥𝑡superscript𝑥𝑡˙𝑥𝑡24𝐹𝑥𝑡superscript𝐹\displaystyle\leqslant 2\|\lambda(x(t)-x^{*}(t))+\dot{x}(t)\|^{2}+4(F(x(t))-F^% {*}).⩽ 2 ∥ italic_λ ( italic_x ( italic_t ) - italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_t ) ) + over˙ start_ARG italic_x end_ARG ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 ( italic_F ( italic_x ( italic_t ) ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) .

By applying the previous inequalities, we have that:

x˙(t)24(1+121)(t).superscriptnorm˙𝑥𝑡241121𝑡\|\dot{x}(t)\|^{2}\leqslant 4\left(1+\frac{1}{\sqrt{2}-1}\right)\mathcal{E}(t).∥ over˙ start_ARG italic_x end_ARG ( italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⩽ 4 ( 1 + divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 end_ARG - 1 end_ARG ) caligraphic_E ( italic_t ) . (97)

The bound on the energy given in (91) lead us to the conclusion:

x˙(t)=𝒪(e(122)t).norm˙𝑥𝑡𝒪superscript𝑒122𝑡\|\dot{x}(t)\|=\mathcal{O}\left(e^{-\left(1-\frac{\sqrt{2}}{2}\right)t}\right).∥ over˙ start_ARG italic_x end_ARG ( italic_t ) ∥ = caligraphic_O ( italic_e start_POSTSUPERSCRIPT - ( 1 - divide start_ARG square-root start_ARG 2 end_ARG end_ARG start_ARG 2 end_ARG ) italic_t end_POSTSUPERSCRIPT ) . (98)

A.2 Proof of Lemma 39

Let ϕsuperscriptitalic-ϕ\phi^{\prime}italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT denote the derivative of ϕitalic-ϕ\phiitalic_ϕ when it is well defined. According to [38], the function ϕitalic-ϕ\phiitalic_ϕ is differentiable except at a countable set of points. This implies that there exists (ti)i1,Nsubscriptsubscript𝑡𝑖𝑖1𝑁(t_{i})_{i\in\llbracket 1,N\rrbracket}( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ ⟦ 1 , italic_N ⟧ end_POSTSUBSCRIPT and N*{+}𝑁superscriptN\in\mathbb{N}^{*}\cup\{+\infty\}italic_N ∈ blackboard_N start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∪ { + ∞ } such that for any i0,N1𝑖0𝑁1i\in\llbracket 0,N-1\rrbracketitalic_i ∈ ⟦ 0 , italic_N - 1 ⟧ and t(ti,ti+1)𝑡subscript𝑡𝑖subscript𝑡𝑖1t\in(t_{i},t_{i+1})italic_t ∈ ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ), ϕ(t)superscriptitalic-ϕ𝑡\phi^{\prime}(t)italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t ) is well defined and equal to ϕ+(t)subscriptitalic-ϕ𝑡\phi_{+}(t)italic_ϕ start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ( italic_t ). We suppose that the sequence is ordered such that t0<ti<ti+1subscript𝑡0subscript𝑡𝑖subscript𝑡𝑖1t_{0}<t_{i}<t_{i+1}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT for any i𝑖iitalic_i and that tN=+subscript𝑡𝑁t_{N}=+\inftyitalic_t start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = + ∞ when N+𝑁N\neq+\inftyitalic_N ≠ + ∞.
Suppose that t(t0,t1)𝑡subscript𝑡0subscript𝑡1t\in(t_{0},t_{1})italic_t ∈ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ).

  • If ϕitalic-ϕ\phiitalic_ϕ is differentiable at t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, then ϕitalic-ϕ\phiitalic_ϕ is differentiable on the interval [t0,t1)subscript𝑡0subscript𝑡1[t_{0},t_{1})[ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and ϕ=ϕ+superscriptitalic-ϕsubscriptitalic-ϕ\phi^{\prime}=\phi_{+}italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_ϕ start_POSTSUBSCRIPT + end_POSTSUBSCRIPT in this interval. Consequently inequality (38) ensures that,

    ϕ(t)ϕ(t0)+t0tψ(u)𝑑u.italic-ϕ𝑡italic-ϕsubscript𝑡0superscriptsubscriptsubscript𝑡0𝑡𝜓𝑢differential-d𝑢\phi(t)\leqslant\phi(t_{0})+\int_{t_{0}}^{t}\psi(u)du.italic_ϕ ( italic_t ) ⩽ italic_ϕ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ψ ( italic_u ) italic_d italic_u .
  • If ϕitalic-ϕ\phiitalic_ϕ is not differentiable at t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, then inequality (38) guarantees that for h>00h>0italic_h > 0 sufficiently small,

    ϕ(t0+h)ϕ(t0)+hψ(t0).italic-ϕsubscript𝑡0italic-ϕsubscript𝑡0𝜓subscript𝑡0\phi(t_{0}+h)\leqslant\phi(t_{0})+h\psi(t_{0}).italic_ϕ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_h ) ⩽ italic_ϕ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_h italic_ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) .

    Then, the previous discussion allows us to say that ϕitalic-ϕ\phiitalic_ϕ is differentiable on [t0+h,t1)subscript𝑡0subscript𝑡1[t_{0}+h,t_{1})[ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_h , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ). As a consequence, we can say that there exists H(0,tt0)𝐻0𝑡subscript𝑡0H\in(0,t-t_{0})italic_H ∈ ( 0 , italic_t - italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) such that for any h(0,H)0𝐻h\in(0,H)italic_h ∈ ( 0 , italic_H ):

    ϕ(t)ϕ(t0+h)+t0+htψ(u)𝑑uϕ(t0)+t0tψ(u)𝑑u+t0t0+h(ψ(t0)ψ(u))𝑑u.italic-ϕ𝑡italic-ϕsubscript𝑡0superscriptsubscriptsubscript𝑡0𝑡𝜓𝑢differential-d𝑢italic-ϕsubscript𝑡0superscriptsubscriptsubscript𝑡0𝑡𝜓𝑢differential-d𝑢superscriptsubscriptsubscript𝑡0subscript𝑡0𝜓subscript𝑡0𝜓𝑢differential-d𝑢\phi(t)\leqslant\phi(t_{0}+h)+\int_{t_{0}+h}^{t}\psi(u)du\leqslant\phi(t_{0})+% \int_{t_{0}}^{t}\psi(u)du+\int_{t_{0}}^{t_{0}+h}\left(\psi(t_{0})-\psi(u)% \right)du.italic_ϕ ( italic_t ) ⩽ italic_ϕ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_h ) + ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ψ ( italic_u ) italic_d italic_u ⩽ italic_ϕ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ψ ( italic_u ) italic_d italic_u + ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_h end_POSTSUPERSCRIPT ( italic_ψ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_ψ ( italic_u ) ) italic_d italic_u .

    As this inequality is valid for any h(0,H)0𝐻h\in(0,H)italic_h ∈ ( 0 , italic_H ), we finally get the wanted inequality (39).

We now suppose that t=t1𝑡subscript𝑡1t=t_{1}italic_t = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. We just proved that (39) is true for all t(t0,t1)𝑡subscript𝑡0subscript𝑡1t\in(t_{0},t_{1})italic_t ∈ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ). Therefore, for all t(t0,t1)𝑡subscript𝑡0subscript𝑡1t\in(t_{0},t_{1})italic_t ∈ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ),

ϕ(t)ϕ(t0)+t0t1ψ(u)𝑑u,italic-ϕ𝑡italic-ϕsubscript𝑡0superscriptsubscriptsubscript𝑡0subscript𝑡1𝜓𝑢differential-d𝑢\phi(t)\leqslant\phi(t_{0})+\int_{t_{0}}^{t_{1}}\psi(u)du,italic_ϕ ( italic_t ) ⩽ italic_ϕ ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_ψ ( italic_u ) italic_d italic_u ,

and as ϕitalic-ϕ\phiitalic_ϕ is continuous we get the same inequality at t=t1𝑡subscript𝑡1t=t_{1}italic_t = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.
By using the same arguments, we can prove that (39) is valid for any t>t1𝑡subscript𝑡1t>t_{1}italic_t > italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Indeed, if t>t1𝑡subscript𝑡1t>t_{1}italic_t > italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, then it means that t(ti,ti+1)𝑡subscript𝑡𝑖subscript𝑡𝑖1t\in(t_{i},t_{i+1})italic_t ∈ ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) or that t=ti𝑡subscript𝑡𝑖t=t_{i}italic_t = italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for some i1,N𝑖1𝑁i\in\llbracket 1,N\rrbracketitalic_i ∈ ⟦ 1 , italic_N ⟧. In both cases, we get the wanted inequality by applying the above reasonings to the consecutive intervals (tj,tj+1)subscript𝑡𝑗subscript𝑡𝑗1(t_{j},t_{j+1})( italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ) for 0ji0𝑗𝑖0\leqslant j\leqslant i0 ⩽ italic_j ⩽ italic_i.

A.3 Proof of Lemma 2

Let n*𝑛superscriptn\in\mathbb{N}^{*}italic_n ∈ blackboard_N start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT. By rewriting

xnxn*=12((xnxn1)+(xn1xn1*)+(xn1*xn*)+(xnxn*)),subscript𝑥𝑛superscriptsubscript𝑥𝑛12subscript𝑥𝑛subscript𝑥𝑛1subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛subscript𝑥𝑛superscriptsubscript𝑥𝑛x_{n}-x_{n}^{*}=\frac{1}{2}\left((x_{n}-x_{n-1})+(x_{n-1}-x_{n-1}^{*})+(x_{n-1% }^{*}-x_{n}^{*})+(x_{n}-x_{n}^{*})\right),italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) + ( italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ) ,

we get that:

xnxn*,xnxn1=12δn+12(xn1xn1*)+(xn1*xn*)+(xnxn*),xnxn1.subscript𝑥𝑛superscriptsubscript𝑥𝑛subscript𝑥𝑛subscript𝑥𝑛112subscript𝛿𝑛12subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛subscript𝑥𝑛superscriptsubscript𝑥𝑛subscript𝑥𝑛subscript𝑥𝑛1\langle x_{n}-x_{n}^{*},x_{n}-x_{n-1}\rangle=\frac{1}{2}\delta_{n}+\frac{1}{2}% \langle(x_{n-1}-x_{n-1}^{*})+(x_{n-1}^{*}-x_{n}^{*})+(x_{n}-x_{n}^{*}),x_{n}-x% _{n-1}\rangle.⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ⟩ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ⟨ ( italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ⟩ .

Noticing that xnxn1=(xnxn*)+(xn*xn1*)+(xn1*xn1)subscript𝑥𝑛subscript𝑥𝑛1subscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛1subscript𝑥𝑛1x_{n}-x_{n-1}=(x_{n}-x_{n}^{*})+(x_{n}^{*}-x_{n-1}^{*})+(x_{n-1}^{*}-x_{n-1})italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) leads to:

2xnxn*,xnxn12subscript𝑥𝑛superscriptsubscript𝑥𝑛subscript𝑥𝑛subscript𝑥𝑛1\displaystyle 2\langle x_{n}-x_{n}^{*},x_{n}-x_{n-1}\rangle2 ⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ⟩ =δn+xn1xn1*,xnxn*+xn1xn1*,xn*xn1*absentsubscript𝛿𝑛subscript𝑥𝑛1superscriptsubscript𝑥𝑛1subscript𝑥𝑛superscriptsubscript𝑥𝑛subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\displaystyle=\delta_{n}+\langle x_{n-1}-x_{n-1}^{*},x_{n}-x_{n}^{*}\rangle+% \langle x_{n-1}-x_{n-1}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle= italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ + ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩
hn1xn*xn1*,xnxn*+xn1xn1*,xn*xn1*subscript𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1subscript𝑥𝑛superscriptsubscript𝑥𝑛subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\displaystyle-h_{n-1}-\langle x_{n}^{*}-x_{n-1}^{*},x_{n}-x_{n}^{*}\rangle+% \langle x_{n-1}-x_{n-1}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle- italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - ⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ + ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩
γn*+xn1xn1*,xn*xn1*xn1xn1*,xnxn*+hnsuperscriptsubscript𝛾𝑛subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1subscript𝑥𝑛1superscriptsubscript𝑥𝑛1subscript𝑥𝑛superscriptsubscript𝑥𝑛subscript𝑛\displaystyle-\gamma_{n}^{*}+\langle x_{n-1}-x_{n-1}^{*},x_{n}^{*}-x_{n-1}^{*}% \rangle-\langle x_{n-1}-x_{n-1}^{*},x_{n}-x_{n}^{*}\rangle+h_{n}- italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ - ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ + italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT
=hnhn1+δnγn*+2xn1xn1*,xn*xn1*.absentsubscript𝑛subscript𝑛1subscript𝛿𝑛superscriptsubscript𝛾𝑛2subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\displaystyle=h_{n}-h_{n-1}+\delta_{n}-\gamma_{n}^{*}+2\langle x_{n-1}-x_{n-1}% ^{*},x_{n}^{*}-x_{n-1}^{*}\rangle.= italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + 2 ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ .

The second claim is proved using the same approach. We rewrite

xn1xn1*=12((xn1xn)+(xnxn*)+(xn*xn1*)+(xn1*xn1)),subscript𝑥𝑛1superscriptsubscript𝑥𝑛112subscript𝑥𝑛1subscript𝑥𝑛subscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛1subscript𝑥𝑛1x_{n-1}-x_{n-1}^{*}=\frac{1}{2}\left((x_{n-1}-x_{n})+(x_{n}-x_{n}^{*})+(x_{n}^% {*}-x_{n-1}^{*})+(x_{n-1}^{*}-x_{n-1})\right),italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( ( italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) ) ,

and consequently:

2xn1xn1*,xnxn1=δn+(xnxn*)+(xn*xn1*)+(xn1*xn1),xnxn1.2subscript𝑥𝑛1superscriptsubscript𝑥𝑛1subscript𝑥𝑛subscript𝑥𝑛1subscript𝛿𝑛subscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛1subscript𝑥𝑛1subscript𝑥𝑛subscript𝑥𝑛12\langle x_{n-1}-x_{n-1}^{*},x_{n}-x_{n-1}\rangle=-\delta_{n}+\langle(x_{n}-x_% {n}^{*})+(x_{n}^{*}-x_{n-1}^{*})+(x_{n-1}^{*}-x_{n-1}),x_{n}-x_{n-1}\rangle.2 ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ⟩ = - italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + ⟨ ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ⟩ .

By applying the same rewriting of xnxn1subscript𝑥𝑛subscript𝑥𝑛1x_{n}-x_{n-1}italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT, simple calculations give that:

xn1xn1*,xnxn1=12(hnhn1δn+γn*)+xnxn*,xn*xn1*.subscript𝑥𝑛1superscriptsubscript𝑥𝑛1subscript𝑥𝑛subscript𝑥𝑛112subscript𝑛subscript𝑛1subscript𝛿𝑛superscriptsubscript𝛾𝑛subscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\langle x_{n-1}-x_{n-1}^{*},x_{n}-x_{n-1}\rangle=\frac{1}{2}(h_{n}-h_{n-1}-% \delta_{n}+\gamma_{n}^{*})+\langle x_{n}-x_{n}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle.⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ⟩ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ .

A.4 Proof of Lemma 3

The first claim is straight forward as Lemma 3.1 of [16] ensures that:

F(xn+1)F(xn)L2(ynxn2xn+1xn2).𝐹subscript𝑥𝑛1𝐹subscript𝑥𝑛𝐿2superscriptnormsubscript𝑦𝑛subscript𝑥𝑛2superscriptnormsubscript𝑥𝑛1subscript𝑥𝑛2F(x_{n+1})-F(x_{n})\leqslant\frac{L}{2}\left(\|y_{n}-x_{n}\|^{2}-\|x_{n+1}-x_{% n}\|^{2}\right).italic_F ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) - italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⩽ divide start_ARG italic_L end_ARG start_ARG 2 end_ARG ( ∥ italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

By writing yn=xn+αn(xnxn1)subscript𝑦𝑛subscript𝑥𝑛subscript𝛼𝑛subscript𝑥𝑛subscript𝑥𝑛1y_{n}=x_{n}+\alpha_{n}(x_{n}-x_{n-1})italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) and 2L(F(xn+1)F(xn))=wn+1wn2𝐿𝐹subscript𝑥𝑛1𝐹subscript𝑥𝑛subscript𝑤𝑛1subscript𝑤𝑛\frac{2}{L}(F(x_{n+1})-F(x_{n}))=w_{n+1}-w_{n}divide start_ARG 2 end_ARG start_ARG italic_L end_ARG ( italic_F ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) - italic_F ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) = italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we can conclude.

By applying Lemma 3.1 of [16] to an other couple of points, we get that:

F(xn+1)F*L2(ynxn*2xn+1xn*2).𝐹subscript𝑥𝑛1superscript𝐹𝐿2superscriptnormsubscript𝑦𝑛superscriptsubscript𝑥𝑛2superscriptnormsubscript𝑥𝑛1superscriptsubscript𝑥𝑛2F(x_{n+1})-F^{*}\leqslant\frac{L}{2}\left(\|y_{n}-x_{n}^{*}\|^{2}-\|x_{n+1}-x_% {n}^{*}\|^{2}\right).italic_F ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) - italic_F start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⩽ divide start_ARG italic_L end_ARG start_ARG 2 end_ARG ( ∥ italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

It follows that:

wn+1subscript𝑤𝑛1\displaystyle w_{n+1}italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT xn+αn(xnxn1)xn*2(xn+1xn+1*)+(xn+1*xn*)2absentsuperscriptnormsubscript𝑥𝑛subscript𝛼𝑛subscript𝑥𝑛subscript𝑥𝑛1superscriptsubscript𝑥𝑛2superscriptnormsubscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛2\displaystyle\leqslant\|x_{n}+\alpha_{n}(x_{n}-x_{n-1})-x_{n}^{*}\|^{2}-\|(x_{% n+1}-x_{n+1}^{*})+(x_{n+1}^{*}-x_{n}^{*})\|^{2}⩽ ∥ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
hn+αn2δnhn+1γn+1*+2αnxnxn*,xnxn1absentsubscript𝑛superscriptsubscript𝛼𝑛2subscript𝛿𝑛subscript𝑛1superscriptsubscript𝛾𝑛12subscript𝛼𝑛subscript𝑥𝑛superscriptsubscript𝑥𝑛subscript𝑥𝑛subscript𝑥𝑛1\displaystyle\leqslant h_{n}+\alpha_{n}^{2}\delta_{n}-h_{n+1}-\gamma_{n+1}^{*}% +2\alpha_{n}\langle x_{n}-x_{n}^{*},x_{n}-x_{n-1}\rangle⩽ italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + 2 italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ⟩
2xn+1xn+1*,xn+1*xn*.2subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛\displaystyle-2\langle x_{n+1}-x_{n+1}^{*},x_{n+1}^{*}-x_{n}^{*}\rangle.- 2 ⟨ italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ .

Recall that the first claim of Lemma 2 ensures that:

xnxn*,xnxn1=12(hnhn1+δnγn*)+xn1xn1*,xn*xn1*,subscript𝑥𝑛superscriptsubscript𝑥𝑛subscript𝑥𝑛subscript𝑥𝑛112subscript𝑛subscript𝑛1subscript𝛿𝑛superscriptsubscript𝛾𝑛subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\langle x_{n}-x_{n}^{*},x_{n}-x_{n-1}\rangle=\frac{1}{2}(h_{n}-h_{n-1}+\delta_% {n}-\gamma_{n}^{*})+\langle x_{n-1}-x_{n-1}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle,⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ⟩ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ ,

we can deduce that:

wn+1subscript𝑤𝑛1\displaystyle w_{n+1}italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT (1+αn)hn+(αn2+αn)δnαnhn1hn+1γn+1*αnγn*absent1subscript𝛼𝑛subscript𝑛superscriptsubscript𝛼𝑛2subscript𝛼𝑛subscript𝛿𝑛subscript𝛼𝑛subscript𝑛1subscript𝑛1superscriptsubscript𝛾𝑛1subscript𝛼𝑛superscriptsubscript𝛾𝑛\displaystyle\leqslant(1+\alpha_{n})h_{n}+(\alpha_{n}^{2}+\alpha_{n})\delta_{n% }-\alpha_{n}h_{n-1}-h_{n+1}-\gamma_{n+1}^{*}-\alpha_{n}\gamma_{n}^{*}⩽ ( 1 + italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + ( italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_γ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT
+2αnxn1xn1*,xn*xn1*2xn+1xn+1*,xn+1*xn*.2subscript𝛼𝑛subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛12subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛\displaystyle+2\alpha_{n}\langle x_{n-1}-x_{n-1}^{*},x_{n}^{*}-x_{n-1}^{*}% \rangle-2\langle x_{n+1}-x_{n+1}^{*},x_{n+1}^{*}-x_{n}^{*}\rangle.+ 2 italic_α start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ - 2 ⟨ italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ .

A.5 Proof of Lemma 51

Let (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be the sequence provided by (V-FISTA) and s=1L𝑠1𝐿s=\frac{1}{L}italic_s = divide start_ARG 1 end_ARG start_ARG italic_L end_ARG. We can write the Lyapunov energy (n)nsubscriptsubscript𝑛𝑛(\mathcal{E}_{n})_{n\in\mathbb{N}}( caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT in the following way:

n=wn+(1λ)δn+λ(hnhn1)+λ2hn1+λγn*+2λxnxn*,xn*xn1*.subscript𝑛subscript𝑤𝑛1𝜆subscript𝛿𝑛𝜆subscript𝑛subscript𝑛1superscript𝜆2subscript𝑛1𝜆superscriptsubscript𝛾𝑛2𝜆subscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\mathcal{E}_{n}=w_{n}+(1-\lambda)\delta_{n}+\lambda(h_{n}-h_{n-1})+\lambda^{2}% h_{n-1}+\lambda\gamma_{n}^{*}+2\lambda\langle x_{n}-x_{n}^{*},x_{n}^{*}-x_{n-1% }^{*}\rangle.caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + ( 1 - italic_λ ) italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_λ ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) + italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT + italic_λ italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + 2 italic_λ ⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ .

Since xnxn*,xn*xn1*0subscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛10\langle x_{n}-x_{n}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle\geqslant 0⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ ⩾ 0, we can write that:

nwn+λ(hnhn1),subscript𝑛subscript𝑤𝑛𝜆subscript𝑛subscript𝑛1\mathcal{E}_{n}\geqslant w_{n}+\lambda(h_{n}-h_{n-1}),caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩾ italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_λ ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) ,

which leads to the final result.

A.6 Proof of Lemma 67

Let (xn)nsubscriptsubscript𝑥𝑛𝑛(x_{n})_{n\in\mathbb{N}}( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be the sequence provided by (V-FISTA). By using the expression (65) of nsubscript𝑛\mathcal{E}_{n}caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we get that:

n+1n=subscript𝑛1subscript𝑛absent\displaystyle\mathcal{E}_{n+1}-\mathcal{E}_{n}=caligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = wn+1wn+λ(α+λα+(1α)2)(hn+1hn)+α(1+λ)δn+1subscript𝑤𝑛1subscript𝑤𝑛𝜆𝛼𝜆𝛼superscript1𝛼2subscript𝑛1subscript𝑛𝛼1𝜆subscript𝛿𝑛1\displaystyle\leavevmode\nobreak\ w_{n+1}-w_{n}+\lambda\left(\alpha+\lambda% \alpha+(1-\alpha)^{2}\right)(h_{n+1}-h_{n})+\alpha(1+\lambda)\delta_{n+1}italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_λ ( italic_α + italic_λ italic_α + ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + italic_α ( 1 + italic_λ ) italic_δ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT (99)
λα(hnhn1)α(1+λ)δnλαγn+1*+λαγn*𝜆𝛼subscript𝑛subscript𝑛1𝛼1𝜆subscript𝛿𝑛𝜆𝛼superscriptsubscript𝛾𝑛1𝜆𝛼superscriptsubscript𝛾𝑛\displaystyle-\lambda\alpha(h_{n}-h_{n-1})-\alpha(1+\lambda)\delta_{n}-\lambda% \alpha\gamma_{n+1}^{*}+\lambda\alpha\gamma_{n}^{*}- italic_λ italic_α ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) - italic_α ( 1 + italic_λ ) italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_λ italic_α italic_γ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + italic_λ italic_α italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT
+2λαxnxn*,xn+1*xn*2λαxn1xn1*,xn*xn1*.2𝜆𝛼subscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛2𝜆𝛼subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\displaystyle+2\lambda\alpha\langle x_{n}-x_{n}^{*},x_{n+1}^{*}-x_{n}^{*}% \rangle-2\lambda\alpha\langle x_{n-1}-x_{n-1}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle.+ 2 italic_λ italic_α ⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ - 2 italic_λ italic_α ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ .

The first claim of Lemma 3 combined to the inequality λαγn+1*+2λαxnxn*,xn+1*xn*0𝜆𝛼superscriptsubscript𝛾𝑛12𝜆𝛼subscript𝑥𝑛superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛0-\lambda\alpha\gamma_{n+1}^{*}+2\lambda\alpha\langle x_{n}-x_{n}^{*},x_{n+1}^{% *}-x_{n}^{*}\rangle\leqslant 0- italic_λ italic_α italic_γ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + 2 italic_λ italic_α ⟨ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ ⩽ 0 lead to:

n+1nsubscript𝑛1subscript𝑛absent\displaystyle\mathcal{E}_{n+1}-\mathcal{E}_{n}\leqslantcaligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ λ(α+λα+(1α)2)(hn+1hn)+(α+λα1)δn+1𝜆𝛼𝜆𝛼superscript1𝛼2subscript𝑛1subscript𝑛𝛼𝜆𝛼1subscript𝛿𝑛1\displaystyle\leavevmode\nobreak\ \lambda\left(\alpha+\lambda\alpha+(1-\alpha)% ^{2}\right)(h_{n+1}-h_{n})+(\alpha+\lambda\alpha-1)\delta_{n+1}italic_λ ( italic_α + italic_λ italic_α + ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + ( italic_α + italic_λ italic_α - 1 ) italic_δ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT (100)
λα(hnhn1)α(1+λα)δn+λαγn*𝜆𝛼subscript𝑛subscript𝑛1𝛼1𝜆𝛼subscript𝛿𝑛𝜆𝛼superscriptsubscript𝛾𝑛\displaystyle-\lambda\alpha(h_{n}-h_{n-1})-\alpha(1+\lambda-\alpha)\delta_{n}+% \lambda\alpha\gamma_{n}^{*}- italic_λ italic_α ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) - italic_α ( 1 + italic_λ - italic_α ) italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_λ italic_α italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT
2λαxn1xn1*,xn*xn1*.2𝜆𝛼subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\displaystyle-2\lambda\alpha\langle x_{n-1}-x_{n-1}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle.- 2 italic_λ italic_α ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ .

According to the second claim of Lemma 3, we have

00absent\displaystyle 0\leqslant0 ⩽ λwn+1+λ(α+α2)δn+λα(hnhn1)λ(hn+1hn)𝜆subscript𝑤𝑛1𝜆𝛼superscript𝛼2subscript𝛿𝑛𝜆𝛼subscript𝑛subscript𝑛1𝜆subscript𝑛1subscript𝑛\displaystyle\leavevmode\nobreak\ -\lambda w_{n+1}+\lambda(\alpha+\alpha^{2})% \delta_{n}+\lambda\alpha(h_{n}-h_{n-1})-\lambda(h_{n+1}-h_{n})- italic_λ italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT + italic_λ ( italic_α + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_λ italic_α ( italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) - italic_λ ( italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) (101)
λγn+1*λαγn*+2λαxn1xn1*,xn*xn1*𝜆superscriptsubscript𝛾𝑛1𝜆𝛼superscriptsubscript𝛾𝑛2𝜆𝛼subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛1\displaystyle-\lambda\gamma_{n+1}^{*}-\lambda\alpha\gamma_{n}^{*}+2\lambda% \alpha\langle x_{n-1}-x_{n-1}^{*},x_{n}^{*}-x_{n-1}^{*}\rangle- italic_λ italic_γ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_λ italic_α italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + 2 italic_λ italic_α ⟨ italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩
2λxn+1xn+1*,xn+1*xn*,2𝜆subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛\displaystyle-2\lambda\langle x_{n+1}-x_{n+1}^{*},x_{n+1}^{*}-x_{n}^{*}\rangle,- 2 italic_λ ⟨ italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ ,

and as xn+1xn+1*,xn+1*xn*0subscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛1superscriptsubscript𝑥𝑛0\langle x_{n+1}-x_{n+1}^{*},x_{n+1}^{*}-x_{n}^{*}\rangle\geqslant 0⟨ italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟩ ⩾ 0, it follows that

n+1nsubscript𝑛1subscript𝑛absent\displaystyle\mathcal{E}_{n+1}-\mathcal{E}_{n}\leqslantcaligraphic_E start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - caligraphic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ λwn+1+λ(α+λα+(1α)21)(hn+1hn)𝜆subscript𝑤𝑛1𝜆𝛼𝜆𝛼superscript1𝛼21subscript𝑛1subscript𝑛\displaystyle\leavevmode\nobreak\ -\lambda w_{n+1}+\lambda\left(\alpha+\lambda% \alpha+(1-\alpha)^{2}-1\right)(h_{n+1}-h_{n})- italic_λ italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT + italic_λ ( italic_α + italic_λ italic_α + ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 ) ( italic_h start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) (102)
+(α+λα1)δn+1α(1αλα)δn.𝛼𝜆𝛼1subscript𝛿𝑛1𝛼1𝛼𝜆𝛼subscript𝛿𝑛\displaystyle+(\alpha+\lambda\alpha-1)\delta_{n+1}-\alpha(1-\alpha-\lambda% \alpha)\delta_{n}.+ ( italic_α + italic_λ italic_α - 1 ) italic_δ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - italic_α ( 1 - italic_α - italic_λ italic_α ) italic_δ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT .

By develo** α+λα+(1α)21𝛼𝜆𝛼superscript1𝛼21\alpha+\lambda\alpha+(1-\alpha)^{2}-1italic_α + italic_λ italic_α + ( 1 - italic_α ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 we get to the conclusion.

Acknowledgements

JFA acknowledges support of the EU Horizon 2020 research and innovation program under the Marie Skłodowska-Curie NoMADS grant agreement No777826, and PEPR PDE-AI. HL acknowledges the financial support of the Ministry of Education, University and Research (grant ML4IP R205T7J2KP). This work was supported by the ANR MICROBLIND (grant ANR-21-CE48-0008) and the ANR Masdol (grant ANR-PRC-CE23).

References

  • [1] T. Alamo, P. Krupa, and D. Limon. Restart of accelerated first-order methods with linear convergence under a quadratic functional growth condition. IEEE Transactions on Automatic Control, 68(1):612–619, 2022.
  • [2] V. Apidopoulos, N. Ginatta, and S. Villa. Convergence rates for the Heavy-Ball continuous dynamics for non-convex optimization, under Polyak–Łojasiewicz condition. Journal of Global Optimization, pages 1–27, 2022.
  • [3] H. Attouch, X. Goudou, and P. Redont. The Heavy-Ball with friction method, I. the continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system. Communications in Contemporary Mathematics, 2(01):1–34, 2000.
  • [4] H. Attouch and J. Peypouquet. The rate of convergence of nesterov’s accelerated forward-backward method is actually faster than 1/k21superscript𝑘21/k^{2}1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. SIAM Journal on Optimization, 26(3):1824–1834, 2016.
  • [5] J.-F. Aujol, L. Calatroni, C. Dossal, H. Labarrière, and A. Rondepierre. Parameter-free FISTA by adaptive restart and backtracking. arXiv preprint arXiv:2307.14323, 2023.
  • [6] J.-F. Aujol, C. Dossal, H. Labarrière, and A. Rondepierre. FISTA restart using an automatic estimation of the growth parameter. Hal Preprint 03153525, May 2022.
  • [7] J.-F. Aujol, C. Dossal, and A. Rondepierre. Convergence rates of the Heavy Ball method for quasi-strongly convex optimization. SIAM Journal on Optimization, 32(3):1817–1842, 2022.
  • [8] J.-F. Aujol, C. Dossal, and A. Rondepierre. Convergence rates of the Heavy-Ball method under the Łojasiewicz property. Mathematical Programming, pages 1–60, 2022.
  • [9] J.-F. Aujol, C. Dossal, and A. Rondepierre. FISTA is an automatic geometrically optimized algorithm for strongly convex functions. Mathematical Programming, 204(1-2), 2024.
  • [10] A. Beck. First-order methods in optimization. SIAM, 2017.
  • [11] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
  • [12] P. Bégout, J. Bolte, and M. A. Jendoubi. On damped second-order gradient systems. Journal of Differential Equations, 259(7):3115–3143, 2015.
  • [13] J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota. Clarke subgradients of stratifiable functions. SIAM Journal on Optimization, 18(2):556–572, 2007.
  • [14] J. F. Bonnans, R. Cominetti, and A. Shapiro. Sensitivity analysis of optimization problems under second order regular constraints. Mathematics of Operations Research, 23(4):806–831, 1998.
  • [15] S. Bubeck, Y. T. Lee, and M. Singh. A geometric alternative to Nesterov’s accelerated gradient descent. arXiv preprint arXiv:1506.08187, 2015.
  • [16] A. Chambolle and C. Dossal. On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. Journal of Optimization theory and Applications, 166(3):968–982, 2015.
  • [17] S. Chen, S. Ma, and W. Liu. Geometric descent method for convex composite minimization. Advances in Neural Information Processing Systems, 30, 2017.
  • [18] Y. Drori and M. Teboulle. Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming, 145(1):451–482, June 2014.
  • [19] O. Fercoq and Z. Qu. Adaptive restart of accelerated gradient methods under local quadratic growth condition. IMA Journal of Numerical Analysis, 39(4):2069–2095, 2019.
  • [20] G. Garrigos, L. Rosasco, and S. Villa. Convergence of the forward-backward algorithm: beyond the worst-case with the help of geometry. Mathematical Programming, pages 1–60, 2022.
  • [21] E. Ghadimi, H. R. Feyzmahdavian, and M. Johansson. Global convergence of the Heavy-Ball method for convex optimization. In 2015 European control conference (ECC), pages 310–315. IEEE, 2015.
  • [22] P. Giselsson and S. Boyd. Monotonicity and restart in fast gradient methods. In 53rd IEEE Conference on Decision and Control, pages 5058–5063. IEEE, 2014.
  • [23] J.-B. Hiriart-Urruty. At what points is the projection map** differentiable? The American Mathematical Monthly, 89(7):456–458, 1982.
  • [24] S. Łojasiewicz. Une propriété topologique des sous-ensembles analytiques réels. In Les Équations aux Dérivées Partielles (Paris, 1962), pages 87–89. Éditions du Centre National de la Recherche Scientifique, Paris, 1963.
  • [25] S. Łojasiewicz. Sur la géométrie semi- et sous-analytique. Annales de l’Institut Fourier. Université de Grenoble, 43(5):1575–1595, 1993.
  • [26] I. Necoara, Y. Nesterov, and F. Glineur. Linear convergence of first order methods for non-strongly convex optimization. Mathematical Programming, 175(1):69–107, 2019.
  • [27] Y. Nesterov. A method of solving a convex programming problem with convergence rate o (1/k22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT). In Sov. Math. Dokl, volume 27, 1983.
  • [28] Y. Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2003.
  • [29] Y. Nesterov. Gradient methods for minimizing composite functions. Mathematical programming, 140(1):125–161, 2013.
  • [30] B. O’donoghue and E. Candes. Adaptive restart for accelerated gradient schemes. Foundations of computational mathematics, 15(3):715–732, 2015.
  • [31] B. Polyak and P. Shcherbakov. Lyapunov functions: An optimization theory perspective. IFAC-PapersOnLine, 50(1):7456–7461, 2017. 20th IFAC World Congress.
  • [32] B. T. Polyak. Some methods of speeding up the convergence of iteration methods. Ussr computational mathematics and mathematical physics, 4(5):1–17, 1964.
  • [33] J. Renegar and B. Grimmer. A simple nearly optimal restart scheme for speeding up first-order methods. Foundations of computational mathematics, 22(1):211–256, 2022.
  • [34] A. Shapiro. Differentiability properties of metric projections onto convex sets. Journal of Optimization Theory and Applications, 169(3):953–964, 2016.
  • [35] J. W. Siegel. Accelerated first-order methods: Differential equations and Lyapunov functions. arXiv preprint arXiv:1903.05671, 2019.
  • [36] A. Taylor and Y. Drori. An optimal gradient method for smooth strongly convex minimization. Mathematical Programming, pages 1–38, 2022.
  • [37] B. Van Scoy, R. A. Freeman, and K. M. Lynch. The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Letters, 2(1):49–54, 2017.
  • [38] G. C. Young. A note on derivates and differential coefficients. Acta mathematica, 37(1):141–154, 1914.