License: arXiv.org perpetual non-exclusive license
arXiv:2306.14932v3 [math.OC] 20 Dec 2023

GloptiNets: Scalable Non-Convex Optimization
with Certificates

Gaspard Beugnot
[email protected]
Inria, École normale supérieure, CNRS, PSL Research University, 75005 Paris, France
Julien Mairal
[email protected]
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
&Alessandro Rudi
[email protected]
Inria, École normale supérieure, CNRS, PSL Research University, 75005 Paris, France
Abstract

We present a novel approach to non-convex optimization with certificates, which handles smooth functions on the hypercube or on the torus. Unlike traditional methods that rely on algebraic properties, our algorithm exploits the regularity of the target function intrinsic in the decay of its Fourier spectrum. By defining a tractable family of models, we allow at the same time to obtain precise certificates and to leverage the advanced and powerful computational techniques developed to optimize neural networks. In this way the scalability of our approach is naturally enhanced by parallel computing with GPUs. Our approach, when applied to the case of polynomials of moderate dimensions but with thousands of coefficients, outperforms the state-of-the-art optimization methods with certificates, as the ones based on Lasserre’s hierarchy, addressing problems intractable for the competitors.

1 Introduction

Non-convex optimization is a difficult and crucial task. In this paper, we aim at optimizing globally a non-convex function defined on the hypercube, by providing a certificate of optimality on the resulting solution. Let hhitalic_h be a smooth function on [1,1]dsuperscript11𝑑[-1,1]^{d}[ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Here we provide an algorithm that given x^^𝑥\widehat{x}over^ start_ARG italic_x end_ARG, an estimate of the minimizer xsubscript𝑥x_{\star}italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT of hhitalic_h

x=argminx[1,1]dh(x),subscript𝑥subscriptargmin𝑥superscript11𝑑𝑥x_{\star}=\operatorname*{arg\,min}_{x\in[-1,1]^{d}}h(x),italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_x ∈ [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_h ( italic_x ) ,

produces an ϵitalic-ϵ\epsilonitalic_ϵ, that constitutes an explicit certificate for the quality of x^^𝑥\widehat{x}over^ start_ARG italic_x end_ARG, of the form

|h(x)h(x^)|ϵδ,subscript𝑥^𝑥subscriptitalic-ϵ𝛿\left\lvert h(x_{\star})-h(\widehat{x})\right\rvert\leq\epsilon_{\delta},| italic_h ( italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) - italic_h ( over^ start_ARG italic_x end_ARG ) | ≤ italic_ϵ start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ,

with probability 1δ1𝛿1-\delta1 - italic_δ. The literature abounds of algorithms to optimize non-convex functions. Typically they are either (a) heuristics, very smart, but with no guarantees of global convergence Moscato et al. (1989); Horst and Pardalos (2013) (b) variation of algorithms used in convex optimization, which can guarantee convergence only to local minima Boyd and Vandenberghe (2004) (c) algorithms with only asymptotic guarantees of convergence to a global minimum, but no explicit certificates Van Laarhoven et al. (1987). In general, the methods recalled above are quite fast to produce some solution, but don’t provide guarantees on its quality, with the result that the produced point can be arbitrarily far from the optimum, so they are used typically where non-reliable results can be accepted.

On the contrary, there are contexts where an explicit quantification of the incurred error is crucial for the task at hand (finance, engineering, scientific validation, safety-critical scenarios Lasserre (2009)). In these cases, more expensive methods that provide certificates are used, such as polynomial sum-of-squares (poly-SoS) Lasserre (2001, 2009). These kinds of techniques are quite powerful since they provide certificates in the form above, often with machine-precision error. However, (a) they have reduced applicability since hhitalic_h must be a multivariate polynomial (possibly sparse, low-degree) and must be known in its analytical form (b) the resulting algorithm is a semi-definite programming optimization on matrices whose size grows very fast with the number of variables and the degree of the polynomial, becoming intractable already in moderate dimensions and degrees.

Our approach builds and extends the more recent line of works on kernel sum-of-squares, and in particular the work of Woodworth et al. (2022) based on the Fourier analysis. It mitigates the limitations of poly-SoS methods in both aspects: (a) we can deal with any function hhitalic_h (not necessarily a polynomial) for which the Fourier transform is known and (b) the resulting algorithm leverages the smoothness properties of the objective function as Woodworth et al. (2022) rather than relying on its algebraic structure leading to way more compact representations than poly-SoS. Contrary to Woodworth et al. (2022), we fully leverage the power of the certificate allowing for a drastic reduction of the computational cost of the method. Indeed, we cast the minimization in terms of a way smaller problem, similar to the optimization of a small neural network that, albeit again non-convex, produces efficiently a good candidate on which we then compute the certificate.

Notably, our focus lies on a posteriori guarantees: we define a family of models that allow for efficient computation of certificates. Once the model structure is established, we have ample flexibility in training the model, offering various possibilities to achieve good certificates in practical scenarios, while still using well-established and effective techniques in the field of deep neural networks (DNN) Goodfellow et al. (2016) to reduce the computational burden of the approach.

Our contributions can be summarized as follows:

  • We propose a new approach to global optimization with certificates which drastically extends the applicability domain allowed by the state of the art, since it can be applied to any function for which we can compute the Fourier transform (not just polynomials).

  • The proposed approach is naturally tailored for GPU computations and provides a refined control of time and memory requirements of the proposed algorithm, contrary to poly-SoS methods (whose complexity scales dramatically and in a rigid way with dimension and degree of the polynomial).

  • From a technical viewpoint, we improve the results in Woodworth et al. (2022), by develo** a fast stochastic approach to recover the certificate in high probability (theorem 3), and we generalize the formulation of the problem to allow the use of powerful techniques from DNN, still providing a certificate on the result (section 3, in particular algorithm 1)

  • In practical applications, we are able to provide certificates for functions in moderate dimensions, which surpasses the capabilities of current state-of-the-art techniques. Specifically, as shown in the experiments we can handle polynomials with thousands of coefficients. This achievement marks an important milestone towards utilizing these models to provide certificates for more challenging real-life problems.

1.1 Previous work

Polynomial SoS.

In the field of certificate-based polynomial optimization, Lasserre’s hierarchy plays a pivotal role Lasserre (2001, 2009). This hierarchy employs a sequence of SDP relaxations with increasing size proportional to O(rd)𝑂superscript𝑟𝑑O(r^{d})italic_O ( italic_r start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) (where d𝑑ditalic_d is the dimension of the space and r𝑟ritalic_r is a parameter that upper bounds the degree of the polynomial) and that ultimately converges to the optimal solution when r𝑟r\to\inftyitalic_r → ∞. While Lasserre’s hierarchy is primarily associated with polynomial optimization, its applicability extends beyond this domain. It offers a specific formulation for the more general moment problem, enabling a wide range of applications; see Henrion et al. (2020) for an introduction. For polynomial optimization problems such as in eq. 1, a significant amount of research has been dedicated to leveraging problem structure to improve the scalability of the hierarchy. This research has predominantly focused on exploiting very specific sparsity patterns among the variables of the polynomial, enabling the handling in these restricted scenarios of instances ranging from a few variables to even thousands of variables Waki et al. (2006); Wang et al. (2021b, a). There have been theoretical results regarding optimization on the hypercube Bach and Rudi (2023); Laurent and Slot (2022), but there are no algorithms handling them natively. Furthermore, alternative approaches exist that exploit different types of structure, such as the constant trace property Mai et al. (2022).

Kernel SoS.

Kernel Sum of Squares (K-SoS) is an emerging research field that originated from the introduction of a novel parametrization for positive functions in Marteau-Ferey et al. (2020). This approach has found application in various domains, including Optimal Control Berthier et al. (2022), Optimal Transport Muzellec et al. (2021) and modeling probability distribution Rudi and Ciliberto (2021). In the context of function optimization, two types of theoretical results have been explored: a priori guarantees Rudi et al. (2020) and a posteriori guarantees Woodworth et al. (2022). A priori guarantees offer insights into the convergence rate towards a global optimum of the function, giving a rate on the number of parameters and the complexity necessary to optimize a function up to a given error. For example, Rudi et al. (2020) proposes a general technique to achieve the global optimum, with error ϵitalic-ϵ\epsilonitalic_ϵ of a function that is s𝑠sitalic_s-times differentiable, by requiring a number of parameters essentially in the order of O(ϵs/d)𝑂superscriptitalic-ϵ𝑠𝑑O(\epsilon^{-s/d})italic_O ( italic_ϵ start_POSTSUPERSCRIPT - italic_s / italic_d end_POSTSUPERSCRIPT ), allowing to avoid the curse of dimensionality in the rate, when the function is very regular, i.e., sd𝑠𝑑s\geq ditalic_s ≥ italic_d, while typical black-box optimization algorithms have a complexity that scales as ϵdsuperscriptitalic-ϵ𝑑\epsilon^{-d}italic_ϵ start_POSTSUPERSCRIPT - italic_d end_POSTSUPERSCRIPT. A-posteriori guarantees focus on providing a certificate for the minimum found by the algorithm. In particular, Woodworth et al. (2022), provides both a-priori guarantee and a-posteriori certificates; however, the model considered makes it computationally infeasible to provide certificates in dimension larger than 2222.

To conclude, approaches based on kernel-SoS allow to extend the applicability of global optimization with certificates methods to a wider family of functions and on exploiting finer regularity properties beyond just the number of variables and the degrees of a polynomial. By comparison, we focus on making the optimization amenable to high-performance GPU computation while retaining an a posteriori certificate of optimality.

2 Computing certificates with extended k-SoS

Without loss of generality (see next remark), with the goal of simplifying the analysis and using powerful tools from harmonic analysis, we cast the problem in terms of minimization of a periodic function f𝑓fitalic_f over the torus, [0,1]dsuperscript01𝑑[0,1]^{d}[ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (we will denote it also as 𝕋dsuperscript𝕋𝑑\mathbb{T}^{d}blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT). In particular, we are interested in minimizing periodic functions for which we know (or we can easily compute) the coefficients of its Fourier representation, i.e.

f=minz𝕋df(z),f(z)=ωdf^ωe2πiωz,z𝕋d,formulae-sequencesubscript𝑓subscript𝑧superscript𝕋𝑑𝑓𝑧formulae-sequence𝑓𝑧subscript𝜔superscript𝑑subscript^𝑓𝜔superscript𝑒2𝜋i𝜔𝑧for-all𝑧superscript𝕋𝑑f_{\star}=\min_{z\in\mathbb{T}^{d}}f(z),\qquad f(z)=\sum_{\omega\in\mathbb{Z}^% {d}}\widehat{f}_{\omega}e^{2\pi\mathrm{i}\omega\cdot z},\quad\forall z\in% \mathbb{T}^{d},italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_z ∈ blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( italic_z ) , italic_f ( italic_z ) = ∑ start_POSTSUBSCRIPT italic_ω ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT 2 italic_π roman_i italic_ω ⋅ italic_z end_POSTSUPERSCRIPT , ∀ italic_z ∈ blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , (1)

where \mathbb{Z}blackboard_Z is the set of integers. This setting is already interesting on its own, as it encompasses a large class of smooth functions. It includes notably trigonometric polynomials, i.e. functions which have only a finite number of non-zero Fourier coefficients f^ωsubscript^𝑓𝜔\widehat{f}_{\omega}over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT. Optimization of trigonometric polynomials arises in multiple research areas, such as the optimal power flow Van Hentenryck (18) or quantum mechanics Hilling and Sudbery (2010). Note that this problem is already NP-hard, as it encompasses for instance the Max-Cut problem Waldspurger et al. (2013). Even so, we will consider the more general case where we can evaluate function values of f𝑓fitalic_f, along with its Fourier coefficient f^ωsubscript^𝑓𝜔\widehat{f}_{\omega}over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT, and we have access to its norm in a certain Hilbert space. This norm can be computed numerically for trigonometric polynomials, and more generally reflects the regularity (degree of differentiability) of the function, and thus the difficulty of the problem.

Remark 1 (No loss of generality in working on the torus).

Given a (non-periodic) function h:[1,1]dnormal-:normal-→superscript11𝑑h:[-1,1]^{d}\to\mathbb{R}italic_h : [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R we can obtain a periodic function whose minimum is exactly h*subscripth_{*}italic_h start_POSTSUBSCRIPT * end_POSTSUBSCRIPT and from which we can recover xsubscript𝑥normal-⋆x_{\star}italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT. Indeed, following the classical Chebychev construction, define cos(2πz)2𝜋𝑧\cos(2\pi z)roman_cos ( 2 italic_π italic_z ) as the componentwise application of cos\cosroman_cos to the elements of 2πz2𝜋𝑧2\pi z2 italic_π italic_z, i.e. cos(2πz):=(cos(2πz1),,cos(2πzd))assign2𝜋𝑧2𝜋subscript𝑧1normal-…2𝜋subscript𝑧𝑑\cos(2\pi z):=(\cos(2\pi z_{1}),\dots,\cos(2\pi z_{d}))roman_cos ( 2 italic_π italic_z ) := ( roman_cos ( 2 italic_π italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , roman_cos ( 2 italic_π italic_z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ) and define f𝑓fitalic_f as f(z):=h(cos(2πz))assign𝑓𝑧2𝜋𝑧f(z):=h(\cos(2\pi z))italic_f ( italic_z ) := italic_h ( roman_cos ( 2 italic_π italic_z ) ) for z[0,1]d𝑧superscript01𝑑z\in[0,1]^{d}italic_z ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. It is immediate to see that (a) f𝑓fitalic_f is periodic, and, (b) since cos(2πz)2𝜋𝑧\cos(2\pi z)roman_cos ( 2 italic_π italic_z ) is invertible on [0,1]dsuperscript01𝑑[0,1]^{d}[ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and its image is exactly [1,1]dsuperscript11𝑑[-1,1]^{d}[ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we have h*=h(x)=f(z)subscriptsubscript𝑥normal-⋆𝑓subscript𝑧normal-⋆h_{*}=h(x_{\star})=f(z_{\star})italic_h start_POSTSUBSCRIPT * end_POSTSUBSCRIPT = italic_h ( italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) = italic_f ( italic_z start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) where

x=cos(2πz),𝑎𝑛𝑑z=minz𝕋df(z).formulae-sequencesubscript𝑥2𝜋subscript𝑧𝑎𝑛𝑑subscript𝑧subscript𝑧superscript𝕋𝑑𝑓𝑧x_{\star}=\cos(2\pi z_{\star}),\quad\textrm{and}\quad z_{\star}=\min_{z\in% \mathbb{T}^{d}}f(z).italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = roman_cos ( 2 italic_π italic_z start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) , and italic_z start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_z ∈ blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( italic_z ) .

We discuss an efficient representation of these problems in section 3.3.

2.1 Certificates for global optimization and k-SoS

A general “recipe” for obtaining a certificates was developed in Woodworth et al. (2022) where, in particular, it was derived the following bound (Woodworth et al., 2022, see Thm. 2)

fsupc,g𝒢+cfcgF,subscript𝑓subscriptsupremumformulae-sequence𝑐𝑔subscript𝒢𝑐subscriptdelimited-∥∥𝑓𝑐𝑔𝐹f_{\star}\geq\sup_{c\in\mathbb{R},\;g\in{\cal G}_{+}}c-\left\lVert f-c-g\right% \rVert_{F}~{}~{},italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ≥ roman_sup start_POSTSUBSCRIPT italic_c ∈ blackboard_R , italic_g ∈ caligraphic_G start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_c - ∥ italic_f - italic_c - italic_g ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT , (2)

where uFsubscriptdelimited-∥∥𝑢𝐹\left\lVert u\right\rVert_{F}∥ italic_u ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT is the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm of the Fourier coefficients of a periodic function u𝑢uitalic_u, i.e.

uF:=ωd|u^ω|,assignsubscriptdelimited-∥∥𝑢𝐹subscript𝜔superscript𝑑subscript^𝑢𝜔\left\lVert u\right\rVert_{F}:=\sum_{\omega\in\mathbb{Z}^{d}}\left\lvert% \widehat{u}_{\omega}\right\rvert,∥ italic_u ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT := ∑ start_POSTSUBSCRIPT italic_ω ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT | , (3)

and the supsupremum\suproman_sup is taken over 𝒢+subscript𝒢{\cal G}_{+}caligraphic_G start_POSTSUBSCRIPT + end_POSTSUBSCRIPT that is a class of non-negative functions. The paper Woodworth et al. (2022) then chooses 𝒢+subscript𝒢{\cal G}_{+}caligraphic_G start_POSTSUBSCRIPT + end_POSTSUBSCRIPT to be the set of positive semidefinite models, leading to a possibly expensive convex SDP problem. Our approach instead starts from the following two observations: (a) the lower bound in eq. 2 holds for any set 𝒢+subscript𝒢{\cal G}_{+}caligraphic_G start_POSTSUBSCRIPT + end_POSTSUBSCRIPT of non-negative functions, not necessarily convex, moreover (b) any candidate solution (g,c)𝑔𝑐(g,c)( italic_g , italic_c ) of the supremum in eq. 2 would constitute a lower bound for fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT, so there is no need to solve eq. 2 exactly. This yields the following theorem

Theorem 1.

Given a point x^𝕋dnormal-^𝑥superscript𝕋𝑑\widehat{x}\in\mathbb{T}^{d}over^ start_ARG italic_x end_ARG ∈ blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and a non-negative and periodic function g0:𝕋d+normal-:subscript𝑔0normal-→superscript𝕋𝑑subscriptg_{0}:\mathbb{T}^{d}\to\mathbb{R}_{+}italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, we have

|f(x^)f(x)|ff(x^)g0F𝑓^𝑥𝑓subscript𝑥subscriptdelimited-∥∥𝑓𝑓^𝑥subscript𝑔0𝐹|f(\widehat{x})-f(x_{\star})|\leq\left\lVert f-f(\widehat{x})-g_{0}\right% \rVert_{F}| italic_f ( over^ start_ARG italic_x end_ARG ) - italic_f ( italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) | ≤ ∥ italic_f - italic_f ( over^ start_ARG italic_x end_ARG ) - italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT (4)
Proof.

Since xsubscript𝑥x_{\star}italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is the minimizer of f𝑓fitalic_f, then f(x)f(x^)𝑓subscript𝑥𝑓^𝑥f(x_{\star})\leq f(\widehat{x})italic_f ( italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ≤ italic_f ( over^ start_ARG italic_x end_ARG ). Moreover, since c0=f(x^)subscript𝑐0𝑓^𝑥c_{0}=f(\widehat{x})italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_f ( over^ start_ARG italic_x end_ARG ) and g0subscript𝑔0g_{0}italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are feasible solutions for the r.h.s. of eq. 2, we have

f(x^)f(x)supc,g𝒢+cfcgFc0fc0g0F,𝑓^𝑥𝑓subscript𝑥subscriptsupremumformulae-sequence𝑐𝑔subscript𝒢𝑐subscriptdelimited-∥∥𝑓𝑐𝑔𝐹subscript𝑐0subscriptdelimited-∥∥𝑓subscript𝑐0subscript𝑔0𝐹f(\widehat{x})\geq f(x_{\star})\geq\sup_{c\in\mathbb{R},\;g\in{\cal G}_{+}}c-% \left\lVert f-c-g\right\rVert_{F}\geq c_{0}-\left\lVert f-c_{0}-g_{0}\right% \rVert_{F},italic_f ( over^ start_ARG italic_x end_ARG ) ≥ italic_f ( italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ≥ roman_sup start_POSTSUBSCRIPT italic_c ∈ blackboard_R , italic_g ∈ caligraphic_G start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_c - ∥ italic_f - italic_c - italic_g ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≥ italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - ∥ italic_f - italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ,

from which we derive that 0f(x^)f(x)ff(x^)g0F0𝑓^𝑥𝑓subscript𝑥subscriptdelimited-∥∥𝑓𝑓^𝑥subscript𝑔0𝐹0\leq f(\widehat{x})-f(x_{\star})\leq\left\lVert f-f(\widehat{x})-g_{0}\right% \rVert_{F}0 ≤ italic_f ( over^ start_ARG italic_x end_ARG ) - italic_f ( italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ≤ ∥ italic_f - italic_f ( over^ start_ARG italic_x end_ARG ) - italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT. ∎

In particular, since any good candidate g0subscript𝑔0g_{0}italic_g start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is enough to produce a certificate, we consider the following class of non-negative functions that can be seen as a two-layer neural network.

Definition 1 (extended K-SoS model on the torus).

Let K:𝕋d×𝕋dnormal-:𝐾normal-→superscript𝕋𝑑superscript𝕋𝑑K:\mathbb{T}^{d}\times\mathbb{T}^{d}\to\mathbb{R}italic_K : blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R be a periodic function in the first variable and let m,r𝑚𝑟m,r\in\mathbb{N}italic_m , italic_r ∈ blackboard_N. Given a set of anchors 𝐙=(𝐳1,,𝐳𝐦)𝕋d𝐙subscript𝐳1normal-…subscript𝐳𝐦superscript𝕋𝑑\mathbf{Z}=(\mathbf{z}_{1},\dots,\mathbf{z_{m}})\subset\mathbb{T}^{d}bold_Z = ( bold_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_z start_POSTSUBSCRIPT bold_m end_POSTSUBSCRIPT ) ⊂ blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and a matrix Rr×m𝑅superscript𝑟𝑚R\in\mathbb{R}^{r\times m}italic_R ∈ blackboard_R start_POSTSUPERSCRIPT italic_r × italic_m end_POSTSUPERSCRIPT, we define the K-SoS model g𝑔gitalic_g with

𝐱𝕋d,g(𝐱)=RK𝐙(𝐱)22,𝑎𝑛𝑑K𝐙(𝐱)=(K(𝐱,𝐳1),,K(𝐱,𝐳m))m.formulae-sequencefor-all𝐱superscript𝕋𝑑formulae-sequence𝑔𝐱superscriptsubscriptdelimited-∥∥𝑅subscript𝐾𝐙𝐱22𝑎𝑛𝑑subscript𝐾𝐙𝐱𝐾𝐱subscript𝐳1𝐾𝐱subscript𝐳𝑚superscript𝑚\forall\mathbf{x}\in\mathbb{T}^{d},\quad g(\mathbf{x})=\left\lVert RK_{\mathbf% {Z}}(\mathbf{x})\right\rVert_{2}^{2},\quad\textrm{and}\quad K_{\mathbf{Z}}(% \mathbf{x})=(K(\mathbf{x},\mathbf{z}_{1}),\dots,K(\mathbf{x},\mathbf{z}_{m}))% \in\mathbb{R}^{m}.∀ bold_x ∈ blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_g ( bold_x ) = ∥ italic_R italic_K start_POSTSUBSCRIPT bold_Z end_POSTSUBSCRIPT ( bold_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , and italic_K start_POSTSUBSCRIPT bold_Z end_POSTSUBSCRIPT ( bold_x ) = ( italic_K ( bold_x , bold_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_K ( bold_x , bold_z start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT . (5)

The functions represented by the model above are non-negative and periodic. The model is an extension of the k-SoS model presented in Marteau-Ferey et al. (2020), where the points (𝐳1,,𝐳𝐦)subscript𝐳1subscript𝐳𝐦(\mathbf{z}_{1},\dots,\mathbf{z_{m}})( bold_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_z start_POSTSUBSCRIPT bold_m end_POSTSUBSCRIPT ) cannot be optimized. Moreover it has the following benefits at the expense of the convexity in the parameters:

  1. 1.

    The extended k-SoS models benefit of the good approximation properties of k-SoS models described in Marteau-Ferey et al. (2020) and especially Rudi and Ciliberto (2021), since they are a super-set of the k-SoS, that have optimal approximation properties for non-negative functions.

  2. 2.

    The extended model can have a reduced number of parameters, by choosing a matrix R𝑅Ritalic_R with r=1𝑟1r=1italic_r = 1 or rmmuch-less-than𝑟𝑚r\ll mitalic_r ≪ italic_m. This will drastically improve the cost of the optimization, while not impacting the approximation properties of the model, since a good approximation is still possible with already r𝑟ritalic_r proportional to d𝑑ditalic_d (Rudi et al., 2020, see Thm. 3).

  3. 3.

    The extended model does not require any positive semidefinite constraint on the matrix (contrary to the base model) that is typically a well-known bottleneck to scale up the optimization in the number of parameters Marteau-Ferey et al. (2020). In the extended model we trade the positive semidefinite constraint with non-convexity. However this allows us to use all the advanced and effective techniques we know for unconstrained (or box-constrained) non-convex optimization for (two-layers) neural networks Goodfellow et al. (2016).

To conclude the picture on the k-SoS models, a critical aspect of the model is the choice of K𝐾Kitalic_K, since it must guarantee good approximation properties and at the same time we need to compute easily its Fourier coefficients since we need to evaluate fcgFsubscriptdelimited-∥∥𝑓𝑐𝑔𝐹\lVert f-c-g\rVert_{F}∥ italic_f - italic_c - italic_g ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT. To this aim, a good candidate for K𝐾Kitalic_K are the reproducing kernels defined on the torus Steinwart and Christmann (2008). We use shift-invariant kernels, enabling a convenient analysis of the associated RKHS through their Fourier Transform.

Definition 2 (Reproducing kernel on the torus).

Let q𝑞qitalic_q be a real function on 𝕋dsuperscript𝕋𝑑\mathbb{T}^{d}blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, with positive Fourier Transform and q(0)=1𝑞01q(0)=1italic_q ( 0 ) = 1. Let K𝐾Kitalic_K be the kernel defined with

x,y𝕋d,K(x,y)=q(xy)=ωdq^ωe2πiω(xy).formulae-sequencefor-all𝑥𝑦superscript𝕋𝑑𝐾𝑥𝑦𝑞𝑥𝑦subscript𝜔superscript𝑑subscript^𝑞𝜔superscript𝑒2𝜋i𝜔𝑥𝑦\forall x,y\in\mathbb{T}^{d},~{}~{}K(x,y)=q(x-y)=\sum_{\omega\in\mathbb{Z}^{d}% }\widehat{q}_{\omega}e^{2\pi\mathrm{i}\omega\cdot(x-y)}.∀ italic_x , italic_y ∈ blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_K ( italic_x , italic_y ) = italic_q ( italic_x - italic_y ) = ∑ start_POSTSUBSCRIPT italic_ω ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT 2 italic_π roman_i italic_ω ⋅ ( italic_x - italic_y ) end_POSTSUPERSCRIPT . (6)

Then, K𝐾Kitalic_K is a r.k bounded by 1111. We denote \mathcal{H}caligraphic_H its Reproducing kernel Hilbert Space (RKHS) and by subscriptdelimited-∥∥normal-⋅\lVert\cdot\rVert_{\mathcal{H}}∥ ⋅ ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT the associated RKHS norm

f2=ωd|f^ω|2/q^ω.superscriptsubscriptdelimited-∥∥𝑓2subscript𝜔superscript𝑑superscriptsubscript^𝑓𝜔2subscript^𝑞𝜔\left\lVert f\right\rVert_{\mathcal{H}}^{2}=\sum_{\omega\in\mathbb{Z}^{d}}|% \widehat{f}_{\omega}|^{2}/\widehat{q}_{\omega}.∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_ω ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT .

Define λ(x)=q(x)2𝜆𝑥𝑞superscript𝑥2\lambda(x)=q(x)^{2}italic_λ ( italic_x ) = italic_q ( italic_x ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. We assume that we can compute (and sample from, see next section) λ^ωsubscriptnormal-^𝜆𝜔\widehat{\lambda}_{\omega}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT, i.e., the Fourier transform of λ𝜆\lambdaitalic_λ, corresponding to (q^q^)ωsubscriptnormal-⋆normal-^𝑞normal-^𝑞𝜔(\widehat{q}\star\widehat{q})_{\omega}( over^ start_ARG italic_q end_ARG ⋆ over^ start_ARG italic_q end_ARG ) start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT, for all ωd𝜔superscript𝑑\omega\in\mathbb{Z}^{d}italic_ω ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

By choosing such a K𝐾Kitalic_K, the models inherit the good approximation properties derived in Marteau-Ferey et al. (2020); Rudi and Ciliberto (2021). We conclude by recalling that shift-invariant r.k kernel have a positive Fourier transform due to Bochner’s theorem Rudin (1990). The fact that K𝐾Kitalic_K is bounded by 1111 can be seen with |K(x,x)|=|q(0)|=ωq^ω=1𝐾𝑥𝑥𝑞0subscript𝜔subscript^𝑞𝜔1|K(x,x)|=|q(0)|=\sum_{\omega}\widehat{q}_{\omega}=1| italic_K ( italic_x , italic_x ) | = | italic_q ( 0 ) | = ∑ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = 1. Finally, note that the Fourier coefficients of an extended k-SoS model can be computed exactly, as in shown e.g. later in lemma 1.

2.2 Providing certificates with the F𝐹Fitalic_F-norm


103superscript10310^{3}10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT0.000.000.000.000.050.050.050.050.100.100.100.100.150.150.150.150.200.200.200.20N𝑁Nitalic_NLsubscript𝐿L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT normF𝐹Fitalic_F-normOur bound (Thm. 3)
Figure 1: ff𝑓subscript𝑓f-f_{\star}italic_f - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is a trigonometric polynomial approximated by an extended k-SoS model g𝑔gitalic_g. The Lsubscript𝐿L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm of the difference (blue) is upper-bounded by the F𝐹Fitalic_F-norm (red), which is itself upper bounded by the MoM inequality in theorem 3, with probability 98%percent9898\%98 %, here showed with respect to the number N𝑁Nitalic_N of sampled frequencies. Shaded area shows min/max values across 10101010 estimations.

As discussed in the previous section our approach for providing a certificate on f𝑓fitalic_f relies on first obtaining x^^𝑥\widehat{x}over^ start_ARG italic_x end_ARG using a fast algorithm without guarantees and solving approximately eq. 2 to obtain the certificate (see theorem 1). With this aim, now we need an efficient way to compute the norm F\|\cdot\|_{F}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT. We use here a stochastic approach. Introducing a probability λ^ωsubscript^𝜆𝜔\widehat{\lambda}_{\omega}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT (that later will be chosen as a rescaled version of λ^ωsubscript^𝜆𝜔\widehat{\lambda}_{\omega}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT in definition 2) on dsuperscript𝑑\mathbb{Z}^{d}blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT we rewrite the F𝐹Fitalic_F-norm

uF=ωdλ^ω|u^ω|λ^ω=𝔼ωλ^ω[|u^ω|λ^ω]subscriptdelimited-∥∥𝑢𝐹subscript𝜔superscript𝑑subscript^𝜆𝜔subscript^𝑢𝜔subscript^𝜆𝜔subscript𝔼similar-to𝜔subscript^𝜆𝜔subscript^𝑢𝜔subscript^𝜆𝜔\left\lVert u\right\rVert_{F}=\sum_{\omega\in\mathbb{Z}^{d}}\widehat{\lambda}_% {\omega}\cdot\frac{\left\lvert\widehat{u}_{\omega}\right\rvert}{\widehat{% \lambda}_{\omega}}=\operatorname{\mathbb{E}}_{\omega\sim\widehat{\lambda}_{% \omega}}\left[\frac{\left\lvert\widehat{u}_{\omega}\right\rvert}{\widehat{% \lambda}_{\omega}}\right]∥ italic_u ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_ω ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ⋅ divide start_ARG | over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT | end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG = blackboard_E start_POSTSUBSCRIPT italic_ω ∼ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ divide start_ARG | over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT | end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG ] (7)

which yields an objective that is amenable to stochastic optimization. From there, Woodworth et al. (2022) computes a certificate by truncating the sum to a hypercube {ω;ωN}𝜔subscriptdelimited-∥∥𝜔𝑁\{\omega;\left\lVert\omega\right\rVert_{\infty}\leq N\}{ italic_ω ; ∥ italic_ω ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_N } of size Ndsuperscript𝑁𝑑N^{d}italic_N start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and bounding the remaining terms with a smoothness assumption on u=fcg𝑢𝑓𝑐𝑔u=f-c-gitalic_u = italic_f - italic_c - italic_g, which enables to control the decay of u^ωsubscript^𝑢𝜔\widehat{u}_{\omega}over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT. We want to avoid this cost exponential in the dimension so we proceed differently.

Probabilistic estimates with the \mathcal{H}caligraphic_H norm.

Given that the F𝐹Fitalic_F-norm can be written as an expectation in eq. 7, we approximate it with an empirical mean S^^𝑆\widehat{S}over^ start_ARG italic_S end_ARG given with N𝑁Nitalic_N i.i.d samples (ωi)1inλ^ωsimilar-tosubscriptsubscript𝜔𝑖1𝑖𝑛subscript^𝜆𝜔(\omega_{i})_{1\leq i\leq n}\sim\widehat{\lambda}_{\omega}( italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT 1 ≤ italic_i ≤ italic_n end_POSTSUBSCRIPT ∼ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT. Now, note that the variance of S^^𝑆\widehat{S}over^ start_ARG italic_S end_ARG can be upper bounded by a Hilbert norm, as

S^=1Ni=1N|u^ωi|λωi,so that𝖵𝖺𝗋S^1N𝔼(|u^ω|λ^ω)2=1Nωd|u^ω|2λ^ω=1Nuλ2,\widehat{S}=\frac{1}{N}\sum_{i=1}^{N}\frac{\left\lvert\widehat{u}_{\omega_{i}}% \right\rvert}{\lambda_{\omega_{i}}},~{}~{}\text{so that}~{}~{}\operatorname{% \mathsf{Var}}\widehat{S}\leq\frac{1}{N}\operatorname{\mathbb{E}}\left(\frac{% \left\lvert\widehat{u}_{\omega}\right\rvert}{\widehat{\lambda}_{\omega}}\right% )^{2}=\frac{1}{N}\sum_{\omega\in\mathbb{Z}^{d}}\frac{\left\lvert\widehat{u}_{% \omega}\right\rvert^{2}}{\widehat{\lambda}_{\omega}}=\frac{1}{N}\left\lVert u% \right\rVert_{\mathcal{H}_{\lambda}}^{2},over^ start_ARG italic_S end_ARG = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG | over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG , so that sansserif_Var over^ start_ARG italic_S end_ARG ≤ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG blackboard_E ( divide start_ARG | over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT | end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_ω ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG | over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∥ italic_u ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (8)

with λsubscript𝜆\mathcal{H}_{\lambda}caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT the RKHS from definition 2 with kernel K(x,x)=ωdλ^ωe2πiω(xx)𝐾𝑥superscript𝑥subscript𝜔superscript𝑑subscript^𝜆𝜔superscript𝑒2𝜋i𝜔𝑥superscript𝑥K(x,x^{\prime})=\sum_{\omega\in\mathbb{Z}^{d}}\widehat{\lambda}_{\omega}e^{2% \pi\mathrm{i}\omega\cdot(x-x^{\prime})}italic_K ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_ω ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT 2 italic_π roman_i italic_ω ⋅ ( italic_x - italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT. This allows to quantify the deviation of S^^𝑆\widehat{S}over^ start_ARG italic_S end_ARG from 𝔼[S^]=uF𝔼^𝑆subscriptdelimited-∥∥𝑢𝐹\operatorname{\mathbb{E}}[\widehat{S}]=\left\lVert u\right\rVert_{F}blackboard_E [ over^ start_ARG italic_S end_ARG ] = ∥ italic_u ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, with e.g. Chebychev’s inequality, as shown in next theorem.

Theorem 2 (Certificate with Chebychev Inequality).

Let (λ^ω)ωsubscriptsubscriptnormal-^𝜆𝜔𝜔(\widehat{\lambda}_{\omega})_{\omega}( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT be a probability distribution on dsuperscript𝑑\mathbb{Z}^{d}blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ) and g𝑔gitalic_g a positive function. Let N>0𝑁0N>0italic_N > 0 and S^normal-^𝑆\widehat{S}over^ start_ARG italic_S end_ARG be the empirical mean of |f^ωcg^ω|/λ^ωsubscriptnormal-^𝑓𝜔𝑐subscriptnormal-^𝑔𝜔subscriptnormal-^𝜆𝜔|\widehat{f}_{\omega}-c-\widehat{g}_{\omega}|/\widehat{\lambda}_{\omega}| over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT - italic_c - over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT | / over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT obtained with N𝑁Nitalic_N samples ωiλ^ωsimilar-tosubscript𝜔𝑖subscriptnormal-^𝜆𝜔\omega_{i}\sim\widehat{\lambda}_{\omega}italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT. Then, a certificate with probability 1δ1𝛿1-\delta1 - italic_δ is given with

fcS^fcgλNδ,S^=1Ni=1N|f^ωic𝟏ωi=0g^ωi|λ^ωi.formulae-sequencesubscript𝑓𝑐^𝑆subscriptdelimited-∥∥𝑓𝑐𝑔subscript𝜆𝑁𝛿^𝑆1𝑁superscriptsubscript𝑖1𝑁subscript^𝑓subscript𝜔𝑖𝑐subscript1subscript𝜔𝑖0subscript^𝑔subscript𝜔𝑖subscript^𝜆subscript𝜔𝑖f_{\star}\geq c-\widehat{S}-\frac{\left\lVert f-c-g\right\rVert_{\mathcal{H}_{% \lambda}}}{\sqrt{N\delta}},~{}~{}\widehat{S}=\frac{1}{N}\sum_{i=1}^{N}\frac{% \left\lvert\widehat{f}_{\omega_{i}}-c\mathbf{1}_{\omega_{i}=0}-\widehat{g}_{% \omega_{i}}\right\rvert}{\widehat{\lambda}_{\omega_{i}}}.italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ≥ italic_c - over^ start_ARG italic_S end_ARG - divide start_ARG ∥ italic_f - italic_c - italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_N italic_δ end_ARG end_ARG , over^ start_ARG italic_S end_ARG = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG | over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_c bold_1 start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT - over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG . (9)
Proof.

From its definition in eq. 7, we see that an unbiased estimator of the F𝐹Fitalic_F-norm is given by S^^𝑆\widehat{S}over^ start_ARG italic_S end_ARG. Then, Chebychev’s inequality states that |S^uF|2𝖵𝖺𝗋S^/δsuperscript^𝑆subscriptdelimited-∥∥𝑢𝐹2𝖵𝖺𝗋^𝑆𝛿|\widehat{S}-\left\lVert u\right\rVert_{F}|^{2}\leq\operatorname{\mathsf{Var}}% \widehat{S}/\delta| over^ start_ARG italic_S end_ARG - ∥ italic_u ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ sansserif_Var over^ start_ARG italic_S end_ARG / italic_δ with probability at least 1δ1𝛿1-\delta1 - italic_δ. Using the computation of the variance in eq. 8, it follows that uFS^+fcgλ/Nδsubscriptdelimited-∥∥𝑢𝐹^𝑆subscriptdelimited-∥∥𝑓𝑐𝑔subscript𝜆𝑁𝛿\left\lVert u\right\rVert_{F}\leq\widehat{S}+\left\lVert f-c-g\right\rVert_{% \mathcal{H}_{\lambda}}/\sqrt{N\delta}∥ italic_u ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ over^ start_ARG italic_S end_ARG + ∥ italic_f - italic_c - italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUBSCRIPT / square-root start_ARG italic_N italic_δ end_ARG with probability at least 1δ1𝛿1-\delta1 - italic_δ. Plugging this expression into eq. 2, we obtain the result. ∎

Note that the norm in λsubscript𝜆\mathcal{H}_{\lambda}caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT can be developed with (assuming for conciseness that (c)𝑐(-c)( - italic_c ) is comprised in the 00-frequency of f𝑓fitalic_f)

uλ2=ωdf^ω*(f^ω2g^ω)λ^ω+gλ2(fλ+gλ)2.superscriptsubscriptdelimited-∥∥𝑢subscript𝜆2subscript𝜔superscript𝑑superscriptsubscript^𝑓𝜔subscript^𝑓𝜔2subscript^𝑔𝜔subscript^𝜆𝜔superscriptsubscriptdelimited-∥∥𝑔subscript𝜆2superscriptsubscriptdelimited-∥∥𝑓subscript𝜆subscriptdelimited-∥∥𝑔subscript𝜆2\left\lVert u\right\rVert_{\mathcal{H}_{\lambda}}^{2}=\sum_{\omega\in\mathbb{Z% }^{d}}\frac{\widehat{f}_{\omega}^{*}(\widehat{f}_{\omega}-2\widehat{g}_{\omega% })}{\widehat{\lambda}_{\omega}}+\left\lVert g\right\rVert_{\mathcal{H}_{% \lambda}}^{2}\leq(\left\lVert f\right\rVert_{\mathcal{H}_{\lambda}}+\left% \lVert g\right\rVert_{\mathcal{H}_{\lambda}})^{2}.∥ italic_u ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_ω ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT - 2 over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ) end_ARG start_ARG over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG + ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ( ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUBSCRIPT + ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (10)

Thus, theorem 2 provides a certificate of fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT as long as we can (i) evaluate the Fourier transform g^ωsubscript^𝑔𝜔\widehat{g}_{\omega}over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT of g𝑔gitalic_g and (ii) compute its Hilbert norm in some r.k λsubscript𝜆\mathcal{H}_{\lambda}caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT induced by λ^ωsubscript^𝜆𝜔\widehat{\lambda}_{\omega}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT. In next section, we detail the choice we make to achieve this efficiently, with kernels amenable to GPU computation, scaling to thousands of coefficients.

Remark 2 (Using a RKHS norm instead of the F𝐹Fitalic_F-norm).

Note that since (λ^ω)ωsubscriptsubscriptnormal-^𝜆𝜔𝜔(\widehat{\lambda}_{\omega})_{\omega}( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT sums to 1111, the associated kernel is bounded by 1111. Hence uLuλsubscriptdelimited-∥∥𝑢subscript𝐿subscriptdelimited-∥∥𝑢subscript𝜆\left\lVert u\right\rVert_{L_{\infty}}\leq\left\lVert u\right\rVert_{\mathcal{% H}_{\lambda}}∥ italic_u ∥ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≤ ∥ italic_u ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and the latter could be used instead of the F𝐹Fitalic_F-norm in eq. 2. There are two reasons for taking our approach instead. Firstly, the F𝐹Fitalic_F-norm is always tighter that a RKHS norm (see e.g.(Woodworth et al., 2022, Lem. 4)); secondly, we cannot compute uλsubscriptdelimited-∥∥𝑢subscript𝜆\left\lVert u\right\rVert_{\mathcal{H}_{\lambda}}∥ italic_u ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUBSCRIPT efficiently and have to rely instead on another upper bound. However, taking the number of samples N=O(uλ2)𝑁𝑂superscriptsubscriptdelimited-∥∥𝑢subscript𝜆2N=O(\left\lVert u\right\rVert_{\mathcal{H}_{\lambda}}^{2})italic_N = italic_O ( ∥ italic_u ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) alleviates this issue.

Exponential concentration bounds with MoM.

The scaling in 1/δ1𝛿1/\sqrt{\delta}1 / square-root start_ARG italic_δ end_ARG in theorem 2 can be prohibitive if one requires a high probability on the result (δ1much-less-than𝛿1\delta\ll 1italic_δ ≪ 1). Hopefully, alternative estimator exist for those cases. The Median-of-Mean estimator is an example, illustrated in theorem 3.

Theorem 3 (Certificate with MoM estimator).

Let (λ^ω)ωsubscriptsubscriptnormal-^𝜆𝜔𝜔(\widehat{\lambda}_{\omega})_{\omega}( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT be a probability distribution on dsuperscript𝑑\mathbb{Z}^{d}blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, and δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ). Draw N>0𝑁0N>0italic_N > 0 frequencies ωiλ^ωsimilar-tosubscript𝜔𝑖subscriptnormal-^𝜆𝜔\omega_{i}\sim\widehat{\lambda}_{\omega}italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT. Define the MoM estimator with the following: for K𝐾K\in\mathbb{N}italic_K ∈ blackboard_N s.t. δ=eK/8𝛿superscript𝑒𝐾8\delta=e^{-K/8}italic_δ = italic_e start_POSTSUPERSCRIPT - italic_K / 8 end_POSTSUPERSCRIPT and N=Kb𝑁𝐾𝑏N=Kbitalic_N = italic_K italic_b, b1𝑏1b\geq 1italic_b ≥ 1, write B1,,BKsubscript𝐵1normal-…subscript𝐵𝐾B_{1},\dots,B_{K}italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_B start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT a partition of [N]delimited-[]𝑁[N][ italic_N ]; then

𝖬𝗈𝖬δ(|u^ωi|/λωi)=median{1biBj|f^ωic𝟏ωi=0g^ωi|λωi}1jK.subscript𝖬𝗈𝖬𝛿subscript^𝑢subscript𝜔𝑖subscript𝜆subscript𝜔𝑖mediansubscript1𝑏subscript𝑖subscript𝐵𝑗subscript^𝑓subscript𝜔𝑖𝑐subscript1subscript𝜔𝑖0subscript^𝑔subscript𝜔𝑖subscript𝜆subscript𝜔𝑖1𝑗𝐾\operatorname{\mathsf{MoM}}_{\delta}(|\widehat{u}_{\omega_{i}}|/\lambda_{% \omega_{i}})=\mathrm{median}\left\{\frac{1}{b}\sum_{i\in B_{j}}\frac{|\widehat% {f}_{\omega_{i}}-c\mathbf{1}_{\omega_{i}=0}-\widehat{g}_{\omega_{i}}|}{\lambda% _{\omega_{i}}}\right\}_{1\leq j\leq K}.sansserif_MoM start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( | over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | / italic_λ start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = roman_median { divide start_ARG 1 end_ARG start_ARG italic_b end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG | over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_c bold_1 start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT - over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG } start_POSTSUBSCRIPT 1 ≤ italic_j ≤ italic_K end_POSTSUBSCRIPT . (11)

A certificate on fsubscript𝑓normal-⋆f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT with probability 1δ1𝛿1-\delta1 - italic_δ follows, with

fc𝖬𝗈𝖬δ(|u^ωi|/λωi)42fcgλlog(1/δ)N.subscript𝑓𝑐subscript𝖬𝗈𝖬𝛿subscript^𝑢subscript𝜔𝑖subscript𝜆subscript𝜔𝑖42subscriptdelimited-∥∥𝑓𝑐𝑔subscript𝜆1𝛿𝑁f_{\star}\geq c-\operatorname{\mathsf{MoM}}_{\delta}(|\widehat{u}_{\omega_{i}}% |/\lambda_{\omega_{i}})-4\sqrt{2}\left\lVert f-c-g\right\rVert_{\mathcal{H}_{% \lambda}}\sqrt{\frac{\log(1/\delta)}{N}}.italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ≥ italic_c - sansserif_MoM start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( | over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | / italic_λ start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - 4 square-root start_ARG 2 end_ARG ∥ italic_f - italic_c - italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT end_POSTSUBSCRIPT square-root start_ARG divide start_ARG roman_log ( 1 / italic_δ ) end_ARG start_ARG italic_N end_ARG end_ARG . (12)
Proof.

Using e.g. Theorem 4.1 from Devroye et al. (2016) we get that the deviation of the MoM estimator from the expectation is bounded by

|uF𝖬𝗈𝖬δ(|u^ωi|/λωi)|42𝖵𝖺𝗋(|u^ω|/λ^ω)log(1/δ)Nwith proba. 1δ.subscriptdelimited-∥∥𝑢𝐹subscript𝖬𝗈𝖬𝛿subscript^𝑢subscript𝜔𝑖subscript𝜆subscript𝜔𝑖42𝖵𝖺𝗋subscript^𝑢𝜔subscript^𝜆𝜔1𝛿𝑁with proba. 1δ.\left\lvert\left\lVert u\right\rVert_{F}-\operatorname{\mathsf{MoM}}_{\delta}(% |\widehat{u}_{\omega_{i}}|/\lambda_{\omega_{i}})\right\rvert\leq 4\sqrt{2}% \sqrt{\operatorname{\mathsf{Var}}(|\widehat{u}_{\omega}|/\widehat{\lambda}_{% \omega})\frac{\log(1/\delta)}{N}}~{}~{}\text{with proba. $1-\delta$.}| ∥ italic_u ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT - sansserif_MoM start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( | over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | / italic_λ start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) | ≤ 4 square-root start_ARG 2 end_ARG square-root start_ARG sansserif_Var ( | over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT | / over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ) divide start_ARG roman_log ( 1 / italic_δ ) end_ARG start_ARG italic_N end_ARG end_ARG with proba. 1 - italic_δ . (13)

Using the upper bound on the variance with the λsubscript𝜆\mathcal{H}_{\lambda}caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT norm given in eq. 8 and plugging the resulting expression into eq. 2, we obtain the result. ∎

To conclude this section, bounding the Lsubscript𝐿L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm from above with the F𝐹Fitalic_F-norm in eq. 3 enables to obtain a certificate on f𝑓fitalic_f, as shown in theorem 1. The F𝐹Fitalic_F-norm requires an infinite number of computation in the general case, but can be bounded efficiently with a probabilistic estimate, given by theorem 2 or theorem 3. This is summed up in fig. 1. Note that the difference FLsubscriptdelimited-∥∥𝐹subscriptdelimited-∥∥subscript𝐿\left\lVert\cdot\right\rVert_{F}-\left\lVert\cdot\right\rVert_{L_{\infty}}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT - ∥ ⋅ ∥ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_POSTSUBSCRIPT is a source of conservatism in the certificate which we do not quantify – yet, the F𝐹Fitalic_F-norm is optimal for a class of norms, see (Woodworth et al., 2022, Lemma 3).

3 Algorithm and implementation

3.1 Bessel kernel

We now detail the specific choice of kernel we make in order to compute the certificate of theorem 2 or theorem 3 efficiently. Our first observation is to use a kernel stable by product, so that we can easily characterize a Hilbert space the model g𝑔gitalic_g belongs to. This restricts the choice to the exponential family. That’s why we define, for a parameter s>0𝑠0s>0italic_s > 0,

x𝕋,qs(x)=es(cos(2πx)1)=ωesI|ω|(s)e2πiωx,formulae-sequencefor-all𝑥𝕋subscript𝑞𝑠𝑥superscript𝑒𝑠2𝜋𝑥1subscript𝜔superscript𝑒𝑠subscript𝐼𝜔𝑠superscript𝑒2𝜋i𝜔𝑥\forall x\in\mathbb{T},~{}~{}q_{s}(x)=e^{s(\cos(2\pi x)-1)}=\sum_{\omega\in% \mathbb{Z}}e^{-s}I_{|\omega|}(s)e^{2\pi\mathrm{i}\omega x},∀ italic_x ∈ blackboard_T , italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x ) = italic_e start_POSTSUPERSCRIPT italic_s ( roman_cos ( 2 italic_π italic_x ) - 1 ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_ω ∈ blackboard_Z end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_s end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT | italic_ω | end_POSTSUBSCRIPT ( italic_s ) italic_e start_POSTSUPERSCRIPT 2 italic_π roman_i italic_ω italic_x end_POSTSUPERSCRIPT , (14)

with I|ω|()subscript𝐼𝜔I_{|\omega|}(\cdot)italic_I start_POSTSUBSCRIPT | italic_ω | end_POSTSUBSCRIPT ( ⋅ ) the modified Bessel function of the first kind (Watson, 1922, p.181). Then, define Ks(x,y)=qs(xy)subscript𝐾𝑠𝑥𝑦subscript𝑞𝑠𝑥𝑦K_{s}(x,y)=q_{s}(x-y)italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x , italic_y ) = italic_q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x - italic_y ) as in definition 2, and take a tensor product to extend the definition of K𝐾Kitalic_K to multiple dimension, i.e.   K𝐬(𝐱,𝐲)==1dK𝐬(𝐱,𝐲)subscript𝐾𝐬𝐱𝐲superscriptsubscriptproduct1𝑑subscript𝐾subscript𝐬subscript𝐱subscript𝐲K_{\mathbf{s}}(\mathbf{x},\mathbf{y})=\prod_{\ell=1}^{d}K_{\mathbf{s}_{\ell}}(% \mathbf{x}_{\ell},\mathbf{y}_{\ell})italic_K start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT ( bold_x , bold_y ) = ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , bold_y start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) for any 𝐱,𝐲𝕋d𝐱𝐲superscript𝕋𝑑\mathbf{x},\mathbf{y}\in\mathbb{T}^{d}bold_x , bold_y ∈ blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. We refer to this kernel as the Bessel kernel, and the associated RKHS as 𝐬subscript𝐬\mathcal{H}_{\mathbf{s}}caligraphic_H start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT. It is stable by product as   K𝐬(x,y)=K𝐬/2(x,y)2subscript𝐾𝐬𝑥𝑦subscript𝐾𝐬2superscript𝑥𝑦2K_{\mathbf{s}}(x,y)=K_{\mathbf{s}/2}(x,y)^{2}italic_K start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT ( italic_x , italic_y ) = italic_K start_POSTSUBSCRIPT bold_s / 2 end_POSTSUBSCRIPT ( italic_x , italic_y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. This is key to compute the Fourier transform of the model g𝑔gitalic_g, and in contrast to previous approaches which used the exponential kernel with q^ωes|ω|proportional-tosubscript^𝑞𝜔superscript𝑒𝑠𝜔\widehat{q}_{\omega}\propto e^{-s|\omega|}over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ∝ italic_e start_POSTSUPERSCRIPT - italic_s | italic_ω | end_POSTSUPERSCRIPT Woodworth et al. (2022); Bach and Rudi (2023).

In the following, g𝑔gitalic_g is a K-SoS model defined as in definition 1, with the Bessel kernel of parameter 𝐬+d𝐬superscriptsubscript𝑑\mathbf{s}\in\mathbb{R}_{+}^{d}bold_s ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT defined in eq. 14.

Lemma 1 (Fourier coefficient of the Bessel kernel).

For ωd𝜔superscript𝑑\omega\in\mathbb{Z}^{d}italic_ω ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, the Fourier coefficient of g𝑔gitalic_g in ω𝜔\omegaitalic_ω can be computed in O(drm2)𝑂𝑑𝑟superscript𝑚2O(drm^{2})italic_O ( italic_d italic_r italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) time with

g^ω=i,j=1mRiRj=1de2𝐬I|ω|(2scosπ(𝐳i𝐳j))eiπω(𝐳i+𝐳j).subscript^𝑔𝜔superscriptsubscript𝑖𝑗1𝑚superscriptsubscript𝑅𝑖topsubscript𝑅𝑗superscriptsubscriptproduct1𝑑superscript𝑒2subscript𝐬subscript𝐼subscript𝜔2𝑠𝜋subscript𝐳𝑖subscript𝐳𝑗superscript𝑒i𝜋subscript𝜔subscript𝐳𝑖subscript𝐳𝑗\widehat{g}_{\omega}=\sum_{i,j=1}^{m}R_{i}^{\top}R_{j}\prod_{\ell=1}^{d}e^{-2% \mathbf{s}_{\ell}}I_{|\omega_{\ell}|}(2s\cos\pi(\mathbf{z}_{i\ell}-\mathbf{z}_% {j\ell}))e^{-\mathrm{i}\pi\omega_{\ell}(\mathbf{z}_{i\ell}+\mathbf{z}_{j\ell})}.over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - 2 bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT | italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | end_POSTSUBSCRIPT ( 2 italic_s roman_cos italic_π ( bold_z start_POSTSUBSCRIPT italic_i roman_ℓ end_POSTSUBSCRIPT - bold_z start_POSTSUBSCRIPT italic_j roman_ℓ end_POSTSUBSCRIPT ) ) italic_e start_POSTSUPERSCRIPT - roman_i italic_π italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i roman_ℓ end_POSTSUBSCRIPT + bold_z start_POSTSUBSCRIPT italic_j roman_ℓ end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT . (15)
Proof.

From its definition in eq. 5, we rewrite g𝑔gitalic_g as

g(x)=i,j=1mRiRj=1dKs(x,𝐳i)Ks(x,𝐳j).𝑔𝑥superscriptsubscript𝑖𝑗1𝑚superscriptsubscript𝑅𝑖topsubscript𝑅𝑗superscriptsubscriptproduct1𝑑subscript𝐾𝑠𝑥subscript𝐳𝑖subscript𝐾𝑠𝑥subscript𝐳𝑗g(x)=\sum_{i,j=1}^{m}R_{i}^{\top}R_{j}\prod_{\ell=1}^{d}K_{s}(x,\mathbf{z}_{i% \ell})K_{s}(x,\mathbf{z}_{j\ell}).italic_g ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x , bold_z start_POSTSUBSCRIPT italic_i roman_ℓ end_POSTSUBSCRIPT ) italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x , bold_z start_POSTSUBSCRIPT italic_j roman_ℓ end_POSTSUBSCRIPT ) . (16)

Now, from the definition of the Bessel kernel in eq. 14, we have that for any (x,y,z)𝕋𝑥𝑦𝑧𝕋(x,y,z)\in\mathbb{T}( italic_x , italic_y , italic_z ) ∈ blackboard_T, K(x,y)K(x,z)=e2se2scos(2π(yz)/2)cos2π(x(y+z)/2)𝐾𝑥𝑦𝐾𝑥𝑧superscript𝑒2𝑠superscript𝑒2𝑠2𝜋𝑦𝑧22𝜋𝑥𝑦𝑧2K(x,y)K(x,z)=e^{-2s}e^{2s\cos(2\pi(y-z)/2)\cos 2\pi(x-(y+z)/2)}italic_K ( italic_x , italic_y ) italic_K ( italic_x , italic_z ) = italic_e start_POSTSUPERSCRIPT - 2 italic_s end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT 2 italic_s roman_cos ( 2 italic_π ( italic_y - italic_z ) / 2 ) roman_cos 2 italic_π ( italic_x - ( italic_y + italic_z ) / 2 ) end_POSTSUPERSCRIPT. By definition of the modified Bessel function, the Fourier coefficient of this expression are given by I|ω|(2scos(2π(yz)/2))subscript𝐼𝜔2𝑠2𝜋𝑦𝑧2I_{|\omega|}(2s\cos(2\pi(y-z)/2))italic_I start_POSTSUBSCRIPT | italic_ω | end_POSTSUBSCRIPT ( 2 italic_s roman_cos ( 2 italic_π ( italic_y - italic_z ) / 2 ) ). Using this into eq. 16, we get the result. ∎

The second necessary ingredient for using the certificate of theorem 2 is computing a RKHS norm for g𝑔gitalic_g. It relies on the inclusion of 2𝐬subscript2𝐬\mathcal{H}_{2\mathbf{s}}caligraphic_H start_POSTSUBSCRIPT 2 bold_s end_POSTSUBSCRIPT into the bigger space of symmetric operator 𝒮(𝐬)𝒮subscript𝐬\mathcal{S}(\mathcal{H}_{\mathbf{s}})caligraphic_S ( caligraphic_H start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT ).

Lemma 2 (Bound on the RKHS norm of g𝑔gitalic_g).

g𝑔gitalic_g belongs to 2𝐬subscript2𝐬\mathcal{H}_{2\mathbf{s}}caligraphic_H start_POSTSUBSCRIPT 2 bold_s end_POSTSUBSCRIPT, and g2𝐬subscriptdelimited-∥∥𝑔subscript2𝐬\lVert g\rVert_{\mathcal{H}_{2\mathbf{s}}}∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT 2 bold_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT is bounded by the Hilbert-Schmidt norm of 𝒮(𝐬)𝒮subscript𝐬\mathcal{S}(\mathcal{H}_{\mathbf{s}})caligraphic_S ( caligraphic_H start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT ), which can be computed in O(dm2+mr2)𝑂𝑑superscript𝑚2𝑚superscript𝑟2O(dm^{2}+mr^{2})italic_O ( italic_d italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_m italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) time, with

g2𝐬2g𝒮(s)2=Tr(RKs,𝐳R)2.\left\lVert g\right\rVert_{\mathcal{H}_{2\mathbf{s}}}^{2}\leq\left\lVert g% \right\rVert_{\mathcal{S}(\mathcal{H}_{s})}^{2}=\operatorname{\mathrm{Tr}\,}(% RK_{s,\mathbf{z}}R^{\top})^{2}.∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT 2 bold_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_S ( caligraphic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = start_OPFUNCTION roman_Tr end_OPFUNCTION ( italic_R italic_K start_POSTSUBSCRIPT italic_s , bold_z end_POSTSUBSCRIPT italic_R start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (17)
Proof.

Assume that d=1𝑑1d=1italic_d = 1; the reasoning can be extended to multiple dimensions with the tensor product. From the computation of the Fourier coefficient in lemma 1 and the fact that I|ω|(2scos(2π))I|ω|(2s)I_{|\omega|}(2s\cos(2\pi\cdot))\leq I_{|\omega|}(2s)italic_I start_POSTSUBSCRIPT | italic_ω | end_POSTSUBSCRIPT ( 2 italic_s roman_cos ( 2 italic_π ⋅ ) ) ≤ italic_I start_POSTSUBSCRIPT | italic_ω | end_POSTSUBSCRIPT ( 2 italic_s ), we have that g^ω=O(I|ω|(2s))subscript^𝑔𝜔𝑂subscript𝐼𝜔2𝑠\widehat{g}_{\omega}=O(I_{|\omega|}(2s))over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = italic_O ( italic_I start_POSTSUBSCRIPT | italic_ω | end_POSTSUBSCRIPT ( 2 italic_s ) ) hence g2s𝑔subscript2𝑠g\in\mathcal{H}_{2s}italic_g ∈ caligraphic_H start_POSTSUBSCRIPT 2 italic_s end_POSTSUBSCRIPT. Finally, since the kernel is stable by product, 2s=sssubscript2𝑠direct-productsubscript𝑠subscript𝑠\mathcal{H}_{2s}=\mathcal{H}_{s}\odot\mathcal{H}_{s}caligraphic_H start_POSTSUBSCRIPT 2 italic_s end_POSTSUBSCRIPT = caligraphic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ⊙ caligraphic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, so we can use e.g.(Paulsen and Raghupathi, 2016, Thm. 5.16), with 1=2=ssubscript1subscript2subscript𝑠\mathcal{H}_{1}=\mathcal{H}_{2}=\mathcal{H}_{s}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = caligraphic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = caligraphic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and 𝒮(s)=ss𝒮subscript𝑠tensor-productsubscript𝑠subscript𝑠\mathcal{S}(\mathcal{H}_{s})=\mathcal{H}_{s}\otimes\mathcal{H}_{s}caligraphic_S ( caligraphic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) = caligraphic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ⊗ caligraphic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, with the operator A=(φ(𝐳1),,φ(𝐳m))RR(φ(𝐳1),,φ(𝐳m))*𝒮(s)𝐴𝜑subscript𝐳1𝜑subscript𝐳𝑚superscript𝑅top𝑅superscript𝜑subscript𝐳1𝜑subscript𝐳𝑚𝒮subscript𝑠A=(\varphi(\mathbf{z}_{1}),\dots,\varphi(\mathbf{z}_{m}))R^{\top}R(\varphi(% \mathbf{z}_{1}),\dots,\varphi(\mathbf{z}_{m}))^{*}\in\mathcal{S}(\mathcal{H}_{% s})italic_A = ( italic_φ ( bold_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_φ ( bold_z start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) italic_R start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R ( italic_φ ( bold_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_φ ( bold_z start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ caligraphic_S ( caligraphic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ). ∎

With lemma 2, we have that the model g𝑔gitalic_g belongs to 2𝐬subscript2𝐬\mathcal{H}_{2\mathbf{s}}caligraphic_H start_POSTSUBSCRIPT 2 bold_s end_POSTSUBSCRIPT, so we will naturally use λ^ω==1de2𝐬Iω(2𝐬)subscript^𝜆𝜔superscriptsubscriptproduct1𝑑superscript𝑒2subscript𝐬subscript𝐼𝜔2subscript𝐬\widehat{\lambda}_{\omega}=\prod_{\ell=1}^{d}e^{-2\mathbf{s}_{\ell}}I_{\omega}% (2\mathbf{s}_{\ell})over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - 2 bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( 2 bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) in theorem 2; said differently, the space λsubscript𝜆\mathcal{H}_{\lambda}caligraphic_H start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT introduced in eq. 8 is simply 2𝐬subscript2𝐬\mathcal{H}_{2\mathbf{s}}caligraphic_H start_POSTSUBSCRIPT 2 bold_s end_POSTSUBSCRIPT defined in eq. 14.

3.2 The algorithm: GloptiNets

We can now describe how GloptiNets yields a certificate on f𝑓fitalic_f. The key observation is that no matter how is obtained our model g(R,𝐳)𝑔𝑅𝐳g(R,\mathbf{z})italic_g ( italic_R , bold_z ) from definition 1, we will always be able to compute a certificate with theorems 2 and 3. Thus, even though optimizing eq. 9 w.r.t (c,R,𝐳)𝑐𝑅𝐳(c,R,\mathbf{z})( italic_c , italic_R , bold_z ) is highly non-convex, we can use any optimization routine and check empirically its efficiency by looking at the certificate. Finally, thanks to its low-rank structure it is cheaper to evaluate g𝑔gitalic_g than evaluating its Fourier coefficient. This is formally shown in proposition 2 in appendix A, where a block-diagonal structure for the model is also introduced. That’s why we first optimize supc,gcfcgsubscriptsupremum𝑐𝑔𝑐subscriptdelimited-∥∥𝑓𝑐𝑔\sup_{c,g}c-\left\lVert f-c-g\right\rVert_{\star}roman_sup start_POSTSUBSCRIPT italic_c , italic_g end_POSTSUBSCRIPT italic_c - ∥ italic_f - italic_c - italic_g ∥ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT, where *subscriptdelimited-∥∥\left\lVert\cdot\right\rVert_{*}∥ ⋅ ∥ start_POSTSUBSCRIPT * end_POSTSUBSCRIPT is a proxy for the Lsubscript𝐿L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm, e.g. the log-sum-exp on a random batch of N𝑁Nitalic_N points111Another detail of practical importance is that this loss can be efficiently backpropagated through; on the other hand, the certificate is not easily vectorized, and the Bessel function involved would require specific approximation to be efficiently backpropagated through.:

fcgLmaxi[N]|f(xi)cg(xi)|LSE(f(xi)cg(xi))i[N].subscriptdelimited-∥∥𝑓𝑐𝑔subscript𝐿subscript𝑖delimited-[]𝑁𝑓subscript𝑥𝑖𝑐𝑔subscript𝑥𝑖LSEsubscript𝑓subscript𝑥𝑖𝑐𝑔subscript𝑥𝑖𝑖delimited-[]𝑁\left\lVert f-c-g\right\rVert_{L_{\infty}}\approx\max_{i\in[N]}\left\lvert f(x% _{i})-c-g(x_{i})\right\rvert\approx\mathrm{LSE}(f(x_{i})-c-g(x_{i}))_{i\in[N]}.∥ italic_f - italic_c - italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≈ roman_max start_POSTSUBSCRIPT italic_i ∈ [ italic_N ] end_POSTSUBSCRIPT | italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_c - italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | ≈ roman_LSE ( italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_c - italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i ∈ [ italic_N ] end_POSTSUBSCRIPT . (18)

This optimization can be carried out by any deep learning libraries with automatic differentiation and any flavour of gradient ascent. Only afterwards do we compute the certificate with theorems 2 and 3. This procedure is summed up in algorithm 1.

Data: A trigonometric polynomial f𝑓fitalic_f, a candidate z𝑧zitalic_z s.t. c=f(z)𝑐𝑓𝑧c=f(z)italic_c = italic_f ( italic_z ), a model g𝑔gitalic_g, and a probability δ𝛿\deltaitalic_δ.
Result: A certificate |ff(z)|ϵδsubscript𝑓𝑓𝑧subscriptitalic-ϵ𝛿|f_{\star}-f(z)|\leq\epsilon_{\delta}| italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT - italic_f ( italic_z ) | ≤ italic_ϵ start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT with proba. 1δ1𝛿1-\delta1 - italic_δ.
/* Optimize g𝑔gitalic_g with function values */
for epoch = 1:nepochs do
       Sample x1,,xN𝕋dsubscript𝑥1subscript𝑥𝑁superscript𝕋𝑑x_{1},\dots,x_{N}\in\mathbb{T}^{d}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ;
       L,L=autodiff(LSE(f(xi)cg(xi))i[N])𝐿𝐿autodiffLSEsubscript𝑓subscript𝑥𝑖𝑐𝑔subscript𝑥𝑖𝑖delimited-[]𝑁L,\nabla L=\mathrm{autodiff}(\mathrm{LSE}(f(x_{i})-c-g(x_{i}))_{i\in[N]})italic_L , ∇ italic_L = roman_autodiff ( roman_LSE ( italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_c - italic_g ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_i ∈ [ italic_N ] end_POSTSUBSCRIPT ) ;
       𝐳,Roptimizer(L)𝐳𝑅optimizer𝐿\mathbf{z},R\leftarrow\mathrm{optimizer}(\nabla L)bold_z , italic_R ← roman_optimizer ( ∇ italic_L ) ;
      
/* Compute a certificate */
λ^ωsubscript^𝜆𝜔\widehat{\lambda}_{\omega}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT: probability distribution on dsuperscript𝑑\mathbb{Z}^{d}blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with λ^ω==1de2𝐬Iω(2𝐬)subscript^𝜆𝜔superscriptsubscriptproduct1𝑑superscript𝑒2subscript𝐬subscript𝐼𝜔2subscript𝐬\widehat{\lambda}_{\omega}=\prod_{\ell=1}^{d}e^{-2\mathbf{s}_{\ell}}I_{\omega}% (2\mathbf{s}_{\ell})over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - 2 bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( 2 bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT );
Sample Ω=(ω1,,ωN)λ^ωΩsubscript𝜔1subscript𝜔𝑁similar-tosubscript^𝜆𝜔\Omega=(\omega_{1},\dots,\omega_{N})\sim\widehat{\lambda}_{\omega}roman_Ω = ( italic_ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_ω start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) ∼ over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ;
Compute M=𝖬𝗈𝖬δ(|f^ωic𝟏ωi=0g^ωi|/λωi)i[n]M=\operatorname{\mathsf{MoM}}_{\delta}(|\widehat{f}_{\omega_{i}}-c\mathbf{1}_{% \omega_{i}=0}-\widehat{g}_{\omega_{i}}|/\lambda_{\omega_{i}})_{i\in[n]}italic_M = sansserif_MoM start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( | over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_c bold_1 start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT - over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT | / italic_λ start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ [ italic_n ] end_POSTSUBSCRIPT and σ¯=g𝒮(𝐬)¯𝜎subscriptdelimited-∥∥𝑔𝒮subscript𝐬\bar{\sigma}=\left\lVert g\right\rVert_{\mathcal{S}(\mathcal{H}_{\mathbf{s}})}over¯ start_ARG italic_σ end_ARG = ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_S ( caligraphic_H start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT;
Returns ϵδ=cM42σ¯log(1/δ)/Nsubscriptitalic-ϵ𝛿𝑐𝑀42¯𝜎1𝛿𝑁\epsilon_{\delta}=c-M-4\sqrt{2}\bar{\sigma}\sqrt{\nicefrac{{\log(1/\delta)}}{{% N}}}italic_ϵ start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT = italic_c - italic_M - 4 square-root start_ARG 2 end_ARG over¯ start_ARG italic_σ end_ARG square-root start_ARG / start_ARG roman_log ( 1 / italic_δ ) end_ARG start_ARG italic_N end_ARG end_ARG;
Algorithm 1 GloptiNets
Remark 3 (Providing a candidate).

In algorithm 1, a candidate estimate c𝑐citalic_c for the minimum value f(x)𝑓subscript𝑥normal-⋆f(x_{\star})italic_f ( italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) is necessary. However, it is possible to overcome this requirement by incorporating c𝑐citalic_c as a learnable parameter within the training loop. Moreover, xsubscript𝑥normal-⋆x_{\star}italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT can be learned using techniques similar to those in Rudi et al. (2020): by replacing the lower bound c𝑐citalic_c with a parabola centered at z𝑧zitalic_z, z𝑧zitalic_z becomes a candidate for xsubscript𝑥normal-⋆x_{\star}italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT with precision corresponding to the tightness of the certificate. Note however that this method introduces additional hyperparameters.

3.3 Specific implementation for the Chebychev basis

As already observed in Bach and Rudi (2023), a result on trigonometric polynomial on 𝕋dsuperscript𝕋𝑑\mathbb{T}^{d}blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT directly extends to a real polynomials on [1,1]dsuperscript11𝑑[-1,1]^{d}[ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. The reason for that is that minimizing hhitalic_h on [1,1]dsuperscript11𝑑[-1,1]^{d}[ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT amounts to minimizing the trigonometric polynomial f=h((cos2πx1,,cos2πxd))𝑓2𝜋subscript𝑥12𝜋subscript𝑥𝑑f=h((\cos 2\pi x_{1},\dots,\cos 2\pi x_{d}))italic_f = italic_h ( ( roman_cos 2 italic_π italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , roman_cos 2 italic_π italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ) on 𝕋dsuperscript𝕋𝑑\mathbb{T}^{d}blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Note however that f𝑓fitalic_f is an even function in all dimension, as for any x𝕋d𝑥superscript𝕋𝑑x\in\mathbb{T}^{d}italic_x ∈ blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, f(x)=f(x1,,xi,,xd)𝑓𝑥𝑓subscript𝑥1subscript𝑥𝑖subscript𝑥𝑑f(x)=f(x_{1},\dots,-x_{i},\dots,x_{d})italic_f ( italic_x ) = italic_f ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). Thus, approximating ff𝑓subscript𝑓f-f_{\star}italic_f - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT with a K-SoS model of definition 1 is suboptimal, in the sense that we could approximate f𝑓fitalic_f only on [0,1/2]dsuperscript012𝑑[0,1/2]^{d}[ 0 , 1 / 2 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, which is 2dsuperscript2𝑑2^{-d}2 start_POSTSUPERSCRIPT - italic_d end_POSTSUPERSCRIPT smaller. Put differently, the Fourier coefficient of f𝑓fitalic_f are real by design: it would be convenient to enforce this structure in the model g𝑔gitalic_g. This is achieved with proposition 1.

Proposition 1 (Kernel defined on the Chebychev basis).

Let q𝑞qitalic_q be a real, even function on the torus, bounded by 1111, as in eq. 6. Let K𝐾Kitalic_K be the kernel defined on [1,1]11[-1,1][ - 1 , 1 ] by

(u,v)(0,1/2),K(cos2πu,cos2πv)=12(q(u+v)+q(uv)).formulae-sequencefor-all𝑢𝑣012𝐾2𝜋𝑢2𝜋𝑣12𝑞𝑢𝑣𝑞𝑢𝑣\forall(u,v)\in(0,\nicefrac{{1}}{{2}}),K(\cos 2\pi u,\cos 2\pi v)=\frac{1}{2}(% q(u+v)+q(u-v)).∀ ( italic_u , italic_v ) ∈ ( 0 , / start_ARG 1 end_ARG start_ARG 2 end_ARG ) , italic_K ( roman_cos 2 italic_π italic_u , roman_cos 2 italic_π italic_v ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_q ( italic_u + italic_v ) + italic_q ( italic_u - italic_v ) ) . (19)

Then K𝐾Kitalic_K is a symmetric, p.d., hence reproducing kernel, bounded by 1111, with explicit feature map given by

(x,y)[1,1],K(x,y)=q^0+ω2q^ωHω(x)Hω(y).formulae-sequencefor-all𝑥𝑦11𝐾𝑥𝑦subscript^𝑞0subscript𝜔2subscript^𝑞𝜔subscript𝐻𝜔𝑥subscript𝐻𝜔𝑦\forall(x,y)\in[-1,1],K(x,y)=\widehat{q}_{0}+\sum_{\omega\in\mathbb{N}}2% \widehat{q}_{\omega}H_{\omega}(x)H_{\omega}(y).∀ ( italic_x , italic_y ) ∈ [ - 1 , 1 ] , italic_K ( italic_x , italic_y ) = over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_ω ∈ blackboard_N end_POSTSUBSCRIPT 2 over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_x ) italic_H start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_y ) . (20)

The proof is available in appendix B. It simply relies on expanding the definition of K𝐾Kitalic_K in eq. 19. The resulting expression in eq. 20 exhibits only cosine terms (in the decomposition of xK(cos2πx,y)maps-to𝑥𝐾2𝜋𝑥𝑦x\mapsto K(\cos 2\pi x,y)italic_x ↦ italic_K ( roman_cos 2 italic_π italic_x , italic_y )). This enables to directly extend the PSD models from definition 1 with such kernels. Finally, when used with the Bessel kernel of eq. 14, we recover an easy computation of the Chebychev coefficient, as shown in lemma 3, in O(drm2)𝑂𝑑𝑟superscript𝑚2O(drm^{2})italic_O ( italic_d italic_r italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) time. This enables to approximate any function expressed on the Chebychev basis. Note that polynomials expressed in other basis can be certified too, by first operating a change of basis.

4 Experiments


101superscript10110^{1}10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT102superscript10210^{2}10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT103superscript10310^{3}10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT101superscript10110^{-1}10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT# paramsCertificatepoly, h=1subscriptdelimited-∥∥1\left\lVert h\right\rVert_{\mathcal{H}}=1∥ italic_h ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT = 1poly, h=2subscriptdelimited-∥∥2\left\lVert h\right\rVert_{\mathcal{H}}=2∥ italic_h ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT = 2kernel, h=1subscriptdelimited-∥∥1\left\lVert h\right\rVert_{\mathcal{H}}=1∥ italic_h ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT = 1kernel, h=2subscriptdelimited-∥∥2\left\lVert h\right\rVert_{\mathcal{H}}=2∥ italic_h ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT = 2
Figure 2: Certificate vs. number of parameters in g𝑔gitalic_g, for a given function hhitalic_h. The higher the RKHS norm of hhitalic_h, the more difficult it is to approximate uniformly and the looser the certificate, independently of the function type. The more parameters in the k-SoS model, the tighter the certificates obtained with theorem 3.

The code to reproduce these experiments is available at

Settings.

Given a function hhitalic_h, we compute a candidate x^^𝑥\widehat{x}over^ start_ARG italic_x end_ARG with gradient descent and multiple initializations. The goal is then to certify that x^^𝑥\widehat{x}over^ start_ARG italic_x end_ARG is indeed a global minimizer of hhitalic_h. This is a common setup in the Polynomial-SoS literature Wang and Magron (2022). To illustrate the influence of the number of parameters, the positive model g𝑔gitalic_g defined in definition 1 for GloptiNets designates either a small model GN-small with 1792 parameters, or a bigger model GN-big with 22528225282252822528 parameters. The latter should have higher expressivity and better interpolate positive functions, leading to tighter certificates. All results for GloptiNets are obtained with confidence 1δ=1e498%1𝛿1superscript𝑒4percent981-\delta=1-e^{-4}\geq 98\%1 - italic_δ = 1 - italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT ≥ 98 %. All other details regarding the experiments are reported in appendix C.

Polynomials.

We first consider the case where hhitalic_h is a random trigonometric polynomial. Note that this is a restrictive analysis, as GloptiNets can handle any smooth functions (i.e. with infinite non-zero Fourier coefficients). Polynomials have various dimension d𝑑ditalic_d, degree p𝑝pitalic_p, number of coefficients n𝑛nitalic_n, but a constant RKHS norm 2𝟏dsubscript2subscript1𝑑\mathcal{H}_{2\mathbf{1}_{d}}caligraphic_H start_POSTSUBSCRIPT 2 bold_1 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT. We compare the performances of GloptiNets to TSSOS, in its complex polynomial variant Wang and Magron (2022). The latter is used with parameters such that it executes the fastest, but without guarantees of convergence to the global minimum fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT. Table 1 shows the certificates h(x)h(x^)subscript𝑥^𝑥h(x_{\star})-h(\widehat{x})italic_h ( italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) - italic_h ( over^ start_ARG italic_x end_ARG ) and the execution times (lower is better, t𝑡titalic_t in seconds) for TSSOS, GN-small and GN-big. Figure 2 provides certificate on a random polynomial, function of the number of parameters in g𝑔gitalic_g.

Kernel mixtures.

While polynomials provide ground for comparison with existing work, GloptiNets is not confined to this function class. This is evidenced by experiments on kernel mixtures, where our approach stands as the only viable alternative we are aware of. The function we certify are of the form h(x)=i=1nαiK(xi,x)𝑥superscriptsubscript𝑖1𝑛subscript𝛼𝑖𝐾subscript𝑥𝑖𝑥h(x)=\sum_{i=1}^{n}\alpha_{i}K(x_{i},x)italic_h ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x ), where K𝐾Kitalic_K is the Bessel kernel of eq. 14. Kernel mixtures are ubiquitous in machine learning and arise e.g. when performing kernel ridge regression. Certificates obtained on mixtures are compared with those obtained on polynomials in fig. 2, function of the model size g𝑔gitalic_g.

Table 1: GloptiNets and TSSOS on random trigonometric polynomials. While TSSOS provides machine-precision certificates, its running time grows exponentially with the problem size, and eventually fails on problems 3 and 6. On the other hand, GloptiNets has constant running time no matter the problem size, and its certificates can be tightened by increasing the model size.
d𝑑ditalic_d p𝑝pitalic_p n𝑛nitalic_n TSSOS GN-small GN-big
Certif. t𝑡titalic_t Certif. t𝑡titalic_t Certif. t𝑡titalic_t
3333 5555 85858585 5.310115.3superscript10115.3\cdot 10^{-11}5.3 ⋅ 10 start_POSTSUPERSCRIPT - 11 end_POSTSUPERSCRIPT 3333 8.351048.35superscript1048.35\cdot 10^{-4}8.35 ⋅ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 61036superscript1036\cdot 10^{3}6 ⋅ 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 2.641042.64superscript1042.64\cdot 10^{-4}2.64 ⋅ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 91039superscript1039\cdot 10^{3}9 ⋅ 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
7777 231231231231 4.710134.7superscript10134.7\cdot 10^{-13}4.7 ⋅ 10 start_POSTSUPERSCRIPT - 13 end_POSTSUPERSCRIPT 120120120120 9.511049.51superscript1049.51\cdot 10^{-4}9.51 ⋅ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 61036superscript1036\cdot 10^{3}6 ⋅ 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 2.901042.90superscript1042.90\cdot 10^{-4}2.90 ⋅ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 91039superscript1039\cdot 10^{3}9 ⋅ 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
9999 489489489489 out of memory! - 1.181031.18superscript1031.18\cdot 10^{-3}1.18 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 61036superscript1036\cdot 10^{3}6 ⋅ 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 3.341043.34superscript1043.34\cdot 10^{-4}3.34 ⋅ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 91039superscript1039\cdot 10^{3}9 ⋅ 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
4444 3333 33333333 3.110103.1superscript10103.1\cdot 10^{-10}3.1 ⋅ 10 start_POSTSUPERSCRIPT - 10 end_POSTSUPERSCRIPT 0.10.10.10.1 2.461022.46superscript1022.46\cdot 10^{-2}2.46 ⋅ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 11041superscript1041\cdot 10^{4}1 ⋅ 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 3.451033.45superscript1033.45\cdot 10^{-3}3.45 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 21042superscript1042\cdot 10^{4}2 ⋅ 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT
5555 225225225225 4.810124.8superscript10124.8\cdot 10^{-12}4.8 ⋅ 10 start_POSTSUPERSCRIPT - 12 end_POSTSUPERSCRIPT 53535353 3.711023.71superscript1023.71\cdot 10^{-2}3.71 ⋅ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 11041superscript1041\cdot 10^{4}1 ⋅ 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 3.591033.59superscript1033.59\cdot 10^{-3}3.59 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 21042superscript1042\cdot 10^{4}2 ⋅ 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT
7777 833833833833 out of memory! - 4.761024.76superscript1024.76\cdot 10^{-2}4.76 ⋅ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 11041superscript1041\cdot 10^{4}1 ⋅ 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 4.851034.85superscript1034.85\cdot 10^{-3}4.85 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 21042superscript1042\cdot 10^{4}2 ⋅ 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT

Results.

There are two key hindsight about the performances of GloptiNets. Firstly, its certificate does not depend on the structure of the function to optimize. Thus, although GloptiNets does not match the performances of TSSOS on small polynomials, it can tackle polynomials which cannot be handled by competitors, with arbitrarily as many coefficients (n=𝑛n=\inftyitalic_n = ∞). For instance, TSSOS cannot handle problems with n{489,833}𝑛489833n\in\{489,833\}italic_n ∈ { 489 , 833 } in table 1. More importantly, GloptiNets can certify a richer class of functions than polynomials, among which kernel mixtures. The performances of GloptiNets mostly depends on the complexity of the function to certify, as measured with its RKHS norm.

Secondly, note that a bigger model yields tighter certificate. This is detailed in fig. 2, where the same function f𝑓fitalic_f is optimized with various models. The dependency of the certificate on the norm of f𝑓fitalic_f is shown in fig. 3 in appendix C, along with experiments with Chebychev polynomials.

5 Limitations

One limitation of GloptiNets is the trade-off resulting from its high flexibility for obtaining a certificate as in algorithm 1. While this flexibility offers numerous advantages, it also introduces the need for an extensive hyperparameter search. Although we have identified a set of hyperparameters that align with deep learning practices – utilizing a Momentum optimizer with cosine decay and a large initial learning rate – the optimal settings may vary depending on the specific problem at hand.

In the same vein, the certificates given by GloptiNets are of moderate accuracy. While adding more parameters into the k-SoS model certainly helps (as shown in fig. 2), alternative optimization scheme to interpolate hh(x^)^𝑥h-h(\widehat{x})italic_h - italic_h ( over^ start_ARG italic_x end_ARG ) with g𝑔gitalic_g might provide easier improvement. For instance, we found that using approximate second-order scheme in algorithm 1 is key to obtaining good certificates.

In the specific settings of polynomial optimization, we highlight that our model is not competitive on problems which exhibits some algebraic structure, as for instance term sparsity or the constant trace property. Typically, problems with coefficients of low degrees (less or equal than 2222), which encompass notably the OPF problem, are really well handled by the family of solvers TSSOS belongs to. Finally, GloptiNets does not handle constraints yet.

6 Conclusion

The GloptiNets algorithm presented in this work lays the foundation for a new family of solvers which provide certificates to non-convex problems. While our approach does not aim to replace the well-established Lasserre’s hierarchy for sparse polynomials, it offers a fresh perspective on tackling a new set of problems at scale. Through demonstrations on synthetic examples, we have showcased the potential of our approach. Further research directions include extensive parameter tuning to obtain tighter certificates, with the possibility of leveraging second-order optimization schemes, along with warm-restart schemes for application which requires solving multiple similar problems sequentially.

Acknowledgments.

AR acknowleges support of the French government under management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute). AR acknowledges support of the European Research Council (grant REAL 947908). JM was supported by the ERC grant number 101087696 (APHE-LAIA project) and by ANR 3IA MIAI@Grenoble Alpe (ANR-19-P3IA-0003).

References

  • Bach and Rudi [2023] Francis Bach and Alessandro Rudi. Exponential convergence of sum-of-squares hierarchies for trigonometric polynomials. SIAM Journal on Optimization, 33(3):2137–2159, 2023.
  • Berthier et al. [2022] Eloïse Berthier, Justin Carpentier, Alessandro Rudi, and Francis Bach. Infinite-Dimensional Sums-of-Squares for Optimal Control. In 2022 IEEE 61st Conference on Decision and Control (CDC), pages 577–582, December 2022. doi: 10.1109/CDC51059.2022.9992396.
  • Boyd and Vandenberghe [2004] Stephen P Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
  • Devroye et al. [2016] Luc Devroye, Matthieu Lerasle, Gabor Lugosi, and Roberto I. Oliveira. Sub-Gaussian Mean Estimators. The Annals of Statistics, 44(6):2695–2725, 2016.
  • Dũng et al. [2017] Dinh Dũng, Vladimir N. Temlyakov, and Tino Ullrich. Hyperbolic Cross Approximation. arXiv:2211.04889, April 2017.
  • Goodfellow et al. [2016] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
  • Henrion et al. [2020] Didier Henrion, Milan Korda, and Jean-Bernard Lasserre. The Moment-SOS Hierarchy, volume 4 of Optimization and Its Applications. World Scientific Publishing Europe Ltd., December 2020. doi: 10.1142/q0252.
  • Hilling and Sudbery [2010] Joseph J. Hilling and Anthony Sudbery. The geometric measure of multipartite entanglement and the singular values of a hypermatrix. Journal of Mathematical Physics, 51(7):072102, July 2010.
  • Horst and Pardalos [2013] Reiner Horst and Panos M Pardalos. Handbook of global optimization, volume 2. Springer Science & Business Media, 2013.
  • Josz and Molzahn [2018] Cédric Josz and Daniel K. Molzahn. Lasserre Hierarchy for Large Scale Polynomial Optimization in Real and Complex Variables. SIAM Journal on Optimization, 28(2):1017–1048, January 2018.
  • Lasserre [2001] Jean B. Lasserre. Global Optimization with Polynomials and the Problem of Moments. SIAM Journal on Optimization, 11(3):796–817, January 2001. doi: 10.1137/S1052623400366802.
  • Lasserre [2009] Jean Bernard Lasserre. Moments, Positive Polynomials and Their Applications, volume 1 of Series on Optimization and Its Applications. October 2009. doi: 10.1142/p665.
  • Laurent and Slot [2022] Monique Laurent and Lucas Slot. An effective version of schmüdgen’s positivstellensatz for the hypercube. Optimization Letters, September 2022. doi: 10.1007/s11590-022-01922-5.
  • Mai et al. [2022] Ngoc Hoang Anh Mai, J. B. Lasserre, Victor Magron, and Jie Wang. Exploiting Constant Trace Property in Large-scale Polynomial Optimization. ACM Transactions on Mathematical Software, 48(4):40:1–40:39, December 2022.
  • Marteau-Ferey et al. [2020] Ulysse Marteau-Ferey, Francis Bach, and Alessandro Rudi. Non-parametric Models for Non-negative Functions. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 12816–12826. Curran Associates, Inc., 2020.
  • Moscato et al. [1989] Pablo Moscato et al. On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms. Caltech concurrent computation program, C3P Report, 826(1989):37, 1989.
  • Muzellec et al. [2021] Boris Muzellec, Adrien Vacher, Francis Bach, François-Xavier Vialard, and Alessandro Rudi. Near-optimal estimation of smooth transport maps with kernel sums-of-squares. arXiv:2112.01907, December 2021.
  • Paulsen and Raghupathi [2016] Vern I. Paulsen and Mrinal Raghupathi. An Introduction to the Theory of Reproducing Kernel Hilbert Spaces. Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2016. doi: 10.1017/CBO9781316219232.
  • Rudi and Ciliberto [2021] Alessandro Rudi and Carlo Ciliberto. PSD Representations for Effective Probability Models. In Advances in Neural Information Processing Systems, volume 34, pages 19411–19422. Curran Associates, Inc., 2021.
  • Rudi et al. [2020] Alessandro Rudi, Ulysse Marteau-Ferey, and Francis Bach. Finding Global Minima via Kernel Approximations. arXiv:2012.11978, December 2020.
  • Rudin [1990] Walter Rudin. The Basic Theorems of Fourier Analysis. In Fourier Analysis on Groups, chapter 1, pages 1–34. John Wiley & Sons, Ltd, 1990. doi: 10.1002/9781118165621.ch1.
  • Steinwart and Christmann [2008] Ingo Steinwart and Andreas Christmann. Support vector machines. Springer Science & Business Media, 2008.
  • Van Hentenryck [18] Pascal Van Hentenryck. Machine Learning for Optimal Power Flows. INFORMS Tutorials in Operations Research, October 18.
  • Van Laarhoven et al. [1987] Peter JM Van Laarhoven, Emile HL Aarts, Peter JM van Laarhoven, and Emile HL Aarts. Simulated annealing. Springer, 1987.
  • Waki et al. [2006] Hayato Waki, Sunyoung Kim, Masakazu Kojima, and Masakazu Muramatsu. Sums of Squares and Semidefinite Program Relaxations for Polynomial Optimization Problems with Structured Sparsity. SIAM Journal on Optimization, 17(1):218–242, January 2006. doi: 10.1137/050623802.
  • Waldspurger et al. [2013] Irène Waldspurger, Alexandre d’Aspremont, and Stéphane Mallat. Phase Recovery, MaxCut and Complex Semidefinite Programming, July 2013.
  • Wang and Magron [2022] Jie Wang and Victor Magron. Exploiting Sparsity in Complex Polynomial Optimization. Journal of Optimization Theory and Applications, 192(1):335–359, January 2022.
  • Wang et al. [2021a] Jie Wang, Victor Magron, and Jean-Bernard Lasserre. Chordal-TSSOS: A Moment-SOS Hierarchy That Exploits Term Sparsity with Chordal Extension. SIAM Journal on Optimization, 31(1):114–141, January 2021a. doi: 10.1137/20M1323564.
  • Wang et al. [2021b] Jie Wang, Victor Magron, and Jean-Bernard Lasserre. TSSOS: A Moment-SOS Hierarchy That Exploits Term Sparsity. SIAM Journal on Optimization, 31(1):30–58, January 2021b. doi: 10.1137/19M1307871.
  • Watson [1922] G. N. Watson. A Treatise on the Theory of Bessel Functions. Cambridge University Press, 1922.
  • Woodworth et al. [2022] Blake Woodworth, Francis Bach, and Alessandro Rudi. Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares. In Proceedings of Thirty Fifth Conference on Learning Theory, pages 4620–4642. PMLR, June 2022.

Appendix A Extensions

We explore additional extensions of GloptiNets that further enhance its appeal. We first describe a block diagonal structure for the model for faster evaluation, a theoretical splitting scheme for optimization, and finally a warm-start scheme.

A.1 Block diagonal structure for efficient computation

Without any further assumption, we see that a model from definition 1 can be evaluated in O(drm)𝑂𝑑𝑟𝑚O(drm)italic_O ( italic_d italic_r italic_m ) time; its Fourier coefficient given by lemma 1 in O(dm2r)𝑂𝑑superscript𝑚2𝑟O(dm^{2}r)italic_O ( italic_d italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r ); the bound on the RKHS norm is computed in O(dm2+mr2)𝑂𝑑superscript𝑚2𝑚superscript𝑟2O(dm^{2}+mr^{2})italic_O ( italic_d italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_m italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) time thanks to lemma 2; all that enables to compute a certificate, as stated in theorem 2, in O(Ndm2r+mr2)𝑂𝑁𝑑superscript𝑚2𝑟𝑚superscript𝑟2O(Ndm^{2}r+mr^{2})italic_O ( italic_N italic_d italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r + italic_m italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) time, where N𝑁Nitalic_N is the number of frequencies sampled. If the function f𝑓fitalic_f to be minimized has big 𝐬subscript𝐬\mathcal{H}_{\mathbf{s}}caligraphic_H start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT norm, we might need a large model size m𝑚mitalic_m to have ffg𝑓subscript𝑓𝑔f-f_{\star}\approx gitalic_f - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ≈ italic_g. Hence, we introduce specific structure on G𝐺Gitalic_G which makes it block-diagonal and better conditioned.

Proposition 2 (Block-diagonal PSD model).

Let g𝑔gitalic_g be a PSD model as in definition 1, with m=bs𝑚𝑏𝑠m=bsitalic_m = italic_b italic_s anchors. Split them into b𝑏bitalic_b groups, denoting them 𝐳ijsubscript𝐳𝑖𝑗\mathbf{z}_{ij}bold_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, i[b]𝑖delimited-[]𝑏i\in[b]italic_i ∈ [ italic_b ] and j[s]𝑗delimited-[]𝑠j\in[s]italic_j ∈ [ italic_s ]. Compute the Cholesky factorization of each kernel matrix TiTi=K𝐳is×ssuperscriptsubscript𝑇𝑖topsubscript𝑇𝑖subscript𝐾subscript𝐳𝑖superscript𝑠𝑠T_{i}^{\top}T_{i}=K_{\mathbf{z}_{i}}\in\mathbb{R}^{s\times s}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_K start_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_s × italic_s end_POSTSUPERSCRIPT. Then, define G𝐺Gitalic_G as a block-diagonal matrix, with b𝑏bitalic_b blocks defined as Gi=R~iR~isubscript𝐺𝑖subscriptnormal-~𝑅𝑖superscriptsubscriptnormal-~𝑅𝑖topG_{i}=\tilde{R}_{i}\tilde{R}_{i}^{\top}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, R~i=T1Risubscriptnormal-~𝑅𝑖superscript𝑇1subscript𝑅𝑖\tilde{R}_{i}=T^{-1}R_{i}over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_T start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and Rir×ssubscript𝑅𝑖superscript𝑟𝑠R_{i}\in\mathbb{R}^{r\times s}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_r × italic_s end_POSTSUPERSCRIPT. Equivalently,

G=(R~1R~1R~bR~b),s.t.g(x)=i=1bR~iK𝐳𝐢(x)2,K𝐳𝐢(x)=K(𝐳ij,x)1js.formulae-sequence𝐺matrixsubscript~𝑅1superscriptsubscript~𝑅1topmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpressionsubscript~𝑅𝑏superscriptsubscript~𝑅𝑏topformulae-sequences.t.𝑔𝑥superscriptsubscript𝑖1𝑏superscriptdelimited-∥∥superscriptsubscript~𝑅𝑖topsubscript𝐾subscript𝐳𝐢𝑥2subscript𝐾subscript𝐳𝐢𝑥𝐾subscriptsubscript𝐳𝑖𝑗𝑥1𝑗𝑠G=\begin{pmatrix}\tilde{R}_{1}\tilde{R}_{1}^{\top}&&\\ &\ddots&\\ &&\tilde{R}_{b}\tilde{R}_{b}^{\top}\end{pmatrix},~{}~{}\text{s.t.}~{}~{}g(x)=% \sum_{i=1}^{b}\left\lVert\tilde{R}_{i}^{\top}K_{\mathbf{z_{i}}}(x)\right\rVert% ^{2},~{}~{}K_{\mathbf{z_{i}}}(x)=K(\mathbf{z}_{ij},x)_{1\leq j\leq s}.italic_G = ( start_ARG start_ROW start_CELL over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⋱ end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) , s.t. italic_g ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ∥ over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_K start_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) = italic_K ( bold_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , italic_x ) start_POSTSUBSCRIPT 1 ≤ italic_j ≤ italic_s end_POSTSUBSCRIPT . (21)

Then g𝑔gitalic_g can be evaluated in O(rbs3d)𝑂𝑟𝑏superscript𝑠3𝑑O(rbs^{3}d)italic_O ( italic_r italic_b italic_s start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_d ) time, g^ωsubscriptnormal-^𝑔𝜔\widehat{g}_{\omega}over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT in O(bs2(dr+s))𝑂𝑏superscript𝑠2𝑑𝑟𝑠O(bs^{2}(dr+s))italic_O ( italic_b italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_d italic_r + italic_s ) ) time, and g𝒮(s)2superscriptsubscriptdelimited-∥∥𝑔𝒮subscript𝑠2\left\lVert g\right\rVert_{\mathcal{S}(\mathcal{H}_{s})}^{2}∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_S ( caligraphic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in O(b2(rs2+r2s)+bs3)𝑂superscript𝑏2𝑟superscript𝑠2superscript𝑟2𝑠𝑏superscript𝑠3O(b^{2}(rs^{2}+r^{2}s)+bs^{3})italic_O ( italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_r italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s ) + italic_b italic_s start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) time. The model has (r+d)bs𝑟𝑑𝑏𝑠(r+d)bs( italic_r + italic_d ) italic_b italic_s real parameters.

Proof.

Having G𝐺Gitalic_G defined as such, it is psd, of rank at most rbsb=m𝑟𝑏𝑠𝑏𝑚rb\leq sb=mitalic_r italic_b ≤ italic_s italic_b = italic_m. Written g(x)=i=1bR~iK𝐳𝐢(x)2𝑔𝑥superscriptsubscript𝑖1𝑏superscriptdelimited-∥∥superscriptsubscript~𝑅𝑖topsubscript𝐾subscript𝐳𝐢𝑥2g(x)=\sum_{i=1}^{b}\lVert\tilde{R}_{i}^{\top}K_{\mathbf{z_{i}}}(x)\rVert^{2}italic_g ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ∥ over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT bold_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we can compute the Fourier coefficient by applying lemma 1 to each of the b𝑏bitalic_b component. Adding the cost of computing Gi=R~iR~isubscript𝐺𝑖subscript~𝑅𝑖superscriptsubscript~𝑅𝑖topG_{i}=\tilde{R}_{i}\tilde{R}_{i}^{\top}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT results in complexity of O(bs2(dr+s))𝑂𝑏superscript𝑠2𝑑𝑟𝑠O(bs^{2}(dr+s))italic_O ( italic_b italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_d italic_r + italic_s ) ). Finally, note that g𝒮(s)2=A𝒮(s)2superscriptsubscriptdelimited-∥∥𝑔𝒮subscript𝑠2superscriptsubscriptdelimited-∥∥𝐴𝒮subscript𝑠2\left\lVert g\right\rVert_{\mathcal{S}(\mathcal{H}_{s})}^{2}=\left\lVert A% \right\rVert_{\mathcal{S}(\mathcal{H}_{s})}^{2}∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_S ( caligraphic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ italic_A ∥ start_POSTSUBSCRIPT caligraphic_S ( caligraphic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT where

A=((φ(𝐳1j))j[s],,(φ(𝐳bj))j[s])(DiagGi)i[b]((φ(𝐳1j))j[s],,(φ(𝐳bj))j[s])*.𝐴subscript𝜑subscript𝐳1𝑗𝑗delimited-[]𝑠subscript𝜑subscript𝐳𝑏𝑗𝑗delimited-[]𝑠subscriptDiagsubscript𝐺𝑖𝑖delimited-[]𝑏superscriptsubscript𝜑subscript𝐳1𝑗𝑗delimited-[]𝑠subscript𝜑subscript𝐳𝑏𝑗𝑗delimited-[]𝑠A=((\varphi(\mathbf{z}_{1j}))_{j\in[s]},\dots,(\varphi(\mathbf{z}_{bj}))_{j\in% [s]})(\operatorname{\mathrm{Diag}}G_{i})_{i\in[b]}((\varphi(\mathbf{z}_{1j}))_% {j\in[s]},\dots,(\varphi(\mathbf{z}_{bj}))_{j\in[s]})^{*}.italic_A = ( ( italic_φ ( bold_z start_POSTSUBSCRIPT 1 italic_j end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_j ∈ [ italic_s ] end_POSTSUBSCRIPT , … , ( italic_φ ( bold_z start_POSTSUBSCRIPT italic_b italic_j end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_j ∈ [ italic_s ] end_POSTSUBSCRIPT ) ( roman_Diag italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ [ italic_b ] end_POSTSUBSCRIPT ( ( italic_φ ( bold_z start_POSTSUBSCRIPT 1 italic_j end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_j ∈ [ italic_s ] end_POSTSUBSCRIPT , … , ( italic_φ ( bold_z start_POSTSUBSCRIPT italic_b italic_j end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_j ∈ [ italic_s ] end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT .

Then, defining Q𝑄Qitalic_Q the matrix of b×b𝑏𝑏b\times bitalic_b × italic_b blocks of size s×s𝑠𝑠s\times sitalic_s × italic_s s.t. for j,k[b]𝑗𝑘delimited-[]𝑏j,k\in[b]italic_j , italic_k ∈ [ italic_b ], Qjk=K(𝐳j,𝐳k)s×ssubscript𝑄𝑗𝑘𝐾subscript𝐳𝑗subscript𝐳𝑘superscript𝑠𝑠Q_{jk}=K(\mathbf{z}_{j},\mathbf{z}_{k})\in\mathbb{R}^{s\times s}italic_Q start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT = italic_K ( bold_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_s × italic_s end_POSTSUPERSCRIPT, we have

A𝒮(s)2=TrQ(DiagGi)i[b]Q(DiagGi)i[b]=j,k=1bTrGjQjkGkQkj,superscriptsubscriptdelimited-∥∥𝐴𝒮subscript𝑠2Tr𝑄subscriptDiagsubscript𝐺𝑖𝑖delimited-[]𝑏𝑄subscriptDiagsubscript𝐺𝑖𝑖delimited-[]𝑏superscriptsubscript𝑗𝑘1𝑏Trsubscript𝐺𝑗subscript𝑄𝑗𝑘subscript𝐺𝑘subscript𝑄𝑘𝑗\left\lVert A\right\rVert_{\mathcal{S}(\mathcal{H}_{s})}^{2}=\operatorname{% \mathrm{Tr}\,}Q(\operatorname{\mathrm{Diag}}G_{i})_{i\in[b]}Q(\operatorname{% \mathrm{Diag}}G_{i})_{i\in[b]}=\sum_{j,k=1}^{b}\operatorname{\mathrm{Tr}\,}G_{% j}Q_{jk}G_{k}Q_{kj},∥ italic_A ∥ start_POSTSUBSCRIPT caligraphic_S ( caligraphic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = start_OPFUNCTION roman_Tr end_OPFUNCTION italic_Q ( roman_Diag italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ [ italic_b ] end_POSTSUBSCRIPT italic_Q ( roman_Diag italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ [ italic_b ] end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j , italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT start_OPFUNCTION roman_Tr end_OPFUNCTION italic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT , (22)

and each term in the sum can be written Tr(R~jQjkR~k)(R~kQkjR~j)=R~jQjkR~kHS2Trsuperscriptsubscript~𝑅𝑗topsubscript𝑄𝑗𝑘subscript~𝑅𝑘superscriptsubscript~𝑅𝑘topsubscript𝑄𝑘𝑗superscriptsubscript~𝑅𝑗topsuperscriptsubscriptdelimited-∥∥superscriptsubscript~𝑅𝑗topsubscript𝑄𝑗𝑘subscript~𝑅𝑘𝐻𝑆2\operatorname{\mathrm{Tr}\,}(\tilde{R}_{j}^{\top}Q_{jk}\tilde{R}_{k})(\tilde{R% }_{k}^{\top}Q_{kj}\tilde{R}_{j}^{\top})=\lVert\tilde{R}_{j}^{\top}Q_{jk}\tilde% {R}_{k}\rVert_{HS}^{2}start_OPFUNCTION roman_Tr end_OPFUNCTION ( over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ( over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) = ∥ over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_H italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, which is computed in O(rs2+r2s)𝑂𝑟superscript𝑠2superscript𝑟2𝑠O(rs^{2}+r^{2}s)italic_O ( italic_r italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s ) time, plus O(bs3)𝑂𝑏superscript𝑠3O(bs^{3})italic_O ( italic_b italic_s start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) to compute the Cholesky factor. ∎

Denoting φ𝐳i=(φ(𝐳ij))1jssubscript𝜑subscript𝐳𝑖subscript𝜑subscript𝐳𝑖𝑗1𝑗𝑠\varphi_{\mathbf{z}_{i}}=(\varphi(\mathbf{z}_{ij}))_{1\leq j\leq s}italic_φ start_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ( italic_φ ( bold_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT 1 ≤ italic_j ≤ italic_s end_POSTSUBSCRIPT, note that

φ𝐳iGiφ𝐳i*=φ𝐳iTi1RiRi(φ𝐳iTi1)*=EiRiRiEi*,subscript𝜑subscript𝐳𝑖subscript𝐺𝑖superscriptsubscript𝜑subscript𝐳𝑖subscript𝜑subscript𝐳𝑖superscriptsubscript𝑇𝑖1subscript𝑅𝑖superscriptsubscript𝑅𝑖topsuperscriptsubscript𝜑subscript𝐳𝑖superscriptsubscript𝑇𝑖1subscript𝐸𝑖subscript𝑅𝑖superscriptsubscript𝑅𝑖topsuperscriptsubscript𝐸𝑖\varphi_{\mathbf{z}_{i}}G_{i}\varphi_{\mathbf{z}_{i}}^{*}=\varphi_{\mathbf{z}_% {i}}T_{i}^{-1}R_{i}R_{i}^{\top}(\varphi_{\mathbf{z}_{i}}T_{i}^{-1})^{*}=E_{i}R% _{i}R_{i}^{\top}E_{i}^{*},italic_φ start_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = italic_φ start_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_φ start_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , (23)

with Ei=φ𝐳iTi1subscript𝐸𝑖subscript𝜑subscript𝐳𝑖superscriptsubscript𝑇𝑖1E_{i}=\varphi_{\mathbf{z}_{i}}T_{i}^{-1}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_φ start_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT an orthonormal basis of Span(φ𝐳ij)1js\operatorname{\mathrm{Span}}(\varphi_{\mathbf{z}_{ij}})_{1\leq j\leq s}roman_Span ( italic_φ start_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT 1 ≤ italic_j ≤ italic_s end_POSTSUBSCRIPT as Ei*Ei=𝐈ssuperscriptsubscript𝐸𝑖subscript𝐸𝑖subscript𝐈𝑠E_{i}^{*}E_{i}=\mathbf{I}_{s}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_I start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. Thus, each model’s coefficient is defined on an orthonormal basis, which makes the optimization easier. Of course, this comes at an added s3superscript𝑠3s^{3}italic_s start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT complexity, which could be alleviated by using e.g. an incomplete Cholesky factorization instead.

Remark 4 (Relation to Term Sparsity in POP).

The successful application of polynomial hierarchies to problems with thousands of variables rely on making the moment matrix M𝑀Mitalic_M having a block structure Wang et al. [2021b, a]. If the monomial basis has size m𝑚mitalic_m, the constraint M0succeeds-or-equals𝑀0M\succeq 0italic_M ⪰ 0 is replaced with M=(DiagMi)i[b]𝑀subscriptnormal-Diagsubscript𝑀𝑖𝑖delimited-[]𝑏M=(\operatorname{\mathrm{Diag}}M_{i})_{i\in[b]}italic_M = ( roman_Diag italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i ∈ [ italic_b ] end_POSTSUBSCRIPT and Mi0succeeds-or-equalssubscript𝑀𝑖0M_{i}\succeq 0italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⪰ 0. This enables to solve b𝑏bitalic_b SDP of size at most s𝑠sitalic_s instead of one of size m𝑚mitalic_m. Our model in proposition 2 follows a similar route for having a lower computational budget.

A.2 Global optimization with splitting scheme

While GloptiNets can provide certificates for functions, it falls behind local solvers in terms of competitiveness. The challenge lies in the fact that finding a certificate is considerably more difficult than finding a local minimum, as it necessitates the uniform approximation of the entire function. However, we present a novel algorithmic framework that has the potential to enhance the competitiveness of GloptiNets with local solvers while simultaneously delivering certificates. Our approach involves partitioning the search domain into multiple regions and computing lower bounds for each partition. By discarding portions of the domain where we can certify that the function exceeds a certain threshold, the algorithm progressively simplifies the optimization problem and removes areas from consideration. Moreover, such an approach is naturally well suited to parallel computation.

The algorithm relies on a divide-and-conquer mechanism. First, we split the hypercube (1,1)dsuperscript11𝑑(-1,1)^{d}( - 1 , 1 ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT in N𝑁Nitalic_N regions, where N𝑁Nitalic_N is the number of core available. We compute an upper bound with a local solver. For each region, we run GloptiNets in parallel, computing a certificate at regular interval. As soon as the certificate is bigger than the upper bound, we stop the process: we know that the global minimum is not in the associated region. We can then reallocate the freed computing power by splitting the biggest current region, which yields an easier problem. We stop as soon as the region considered are small enough. This is summarized in algorithm 2, where P indicates the loop run in parallel.

Note that minimizing f𝑓fitalic_f on a hypercube of center μ𝜇\muitalic_μ and size σ𝜎\sigmaitalic_σ amounts to minimizing xf((xμ)/σ)maps-to𝑥𝑓𝑥𝜇𝜎x\mapsto f((x-\mu)/\sigma)italic_x ↦ italic_f ( ( italic_x - italic_μ ) / italic_σ ) on [1,1]dsuperscript11𝑑[-1,1]^{d}[ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, which is another Chebychev polynomial whose coefficients can be evaluated efficiently thanks to the order-2 relation every orthonormal polynomial satisfy. For Chebychev polynomials, that is Hω+1(x)=2xHω(x)Hω1(x)subscript𝐻𝜔1𝑥2𝑥subscript𝐻𝜔𝑥subscript𝐻𝜔1𝑥H_{\omega+1}(x)=2xH_{\omega}(x)-H_{\omega-1}(x)italic_H start_POSTSUBSCRIPT italic_ω + 1 end_POSTSUBSCRIPT ( italic_x ) = 2 italic_x italic_H start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_x ) - italic_H start_POSTSUBSCRIPT italic_ω - 1 end_POSTSUBSCRIPT ( italic_x ).

Data: A Chebychev polynomial f𝑓fitalic_f with a unique global optimum, a probability δ𝛿\deltaitalic_δ, a number of cores N𝑁Nitalic_N and a volume ρ<1/N𝜌1𝑁\rho<1/Nitalic_ρ < 1 / italic_N.
Result: A certificate on f𝑓fitalic_f: fCδ(f)subscript𝑓subscript𝐶𝛿𝑓f_{\star}\geq C_{\delta}(f)italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ≥ italic_C start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( italic_f ) with proba. 1δ1subscript𝛿1-\delta_{\star}1 - italic_δ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT.
/* Initialization: upper bound and partition */
Π=𝗉𝖺𝗋𝗍𝗂𝗍𝗂𝗈𝗇([1,1]d,N)Π𝗉𝖺𝗋𝗍𝗂𝗍𝗂𝗈𝗇superscript11𝑑𝑁\Pi=\mathsf{partition}([-1,1]^{d},N)roman_Π = sansserif_partition ( [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_N ), δ=0subscript𝛿0\delta_{\star}=0italic_δ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = 0 ;
P 𝗎𝖻=minπΠ{𝗅𝗈𝖼𝖺𝗅𝗌𝗈𝗅𝗏𝖾𝗋xπf(x)}𝗎𝖻subscript𝜋Πsubscript𝗅𝗈𝖼𝖺𝗅𝗌𝗈𝗅𝗏𝖾𝗋𝑥𝜋𝑓𝑥\mathsf{ub}=\min_{\pi\in\Pi}\left\{\mathsf{localsolver}_{x\in\pi}f(x)\right\}sansserif_ub = roman_min start_POSTSUBSCRIPT italic_π ∈ roman_Π end_POSTSUBSCRIPT { sansserif_localsolver start_POSTSUBSCRIPT italic_x ∈ italic_π end_POSTSUBSCRIPT italic_f ( italic_x ) };
/* Iterate over the partition */
P for πΠ𝜋normal-Π\pi\in\Piitalic_π ∈ roman_Π, While 𝗅𝖾𝗇𝗀𝗍𝗁(Π)>1𝗅𝖾𝗇𝗀𝗍𝗁normal-Π1\mathsf{length}(\Pi)>1sansserif_length ( roman_Π ) > 1 do
       while Cδ(fπ)<𝗎𝖻subscript𝐶𝛿subscript𝑓𝜋𝗎𝖻C_{\delta}(f_{\pi})<\mathsf{ub}italic_C start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ) < sansserif_ub do
             Continue optimization;
      Split biggest part: π0=argmaxπΠVol(π)subscript𝜋0subscript𝜋ΠVol𝜋\pi_{0}=\arg\max_{\pi\in\Pi}\mathrm{Vol}(\pi)italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT italic_π ∈ roman_Π end_POSTSUBSCRIPT roman_Vol ( italic_π ); (π1,π2)=𝗉𝖺𝗋𝗍𝗂𝗍𝗂𝗈𝗇(π0,2)subscript𝜋1subscript𝜋2𝗉𝖺𝗋𝗍𝗂𝗍𝗂𝗈𝗇subscript𝜋02(\pi_{1},\pi_{2})=\mathsf{partition}(\pi_{0},2)( italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = sansserif_partition ( italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , 2 ) ;
       If Vol(π1,2)<ρVolsubscript𝜋12𝜌\mathrm{Vol}(\pi_{1,2})<\rhoroman_Vol ( italic_π start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT ) < italic_ρ: end this process ;
       Update upper bound: 𝗎𝖻=min{𝗎𝖻,𝗅𝗈𝖼𝖺𝗅𝗌𝗈𝗅𝗏𝖾𝗋xπ1,2f(x)}𝗎𝖻𝗎𝖻subscript𝗅𝗈𝖼𝖺𝗅𝗌𝗈𝗅𝗏𝖾𝗋𝑥subscript𝜋12𝑓𝑥\mathsf{ub}=\min\left\{\mathsf{ub},\mathsf{localsolver}_{x\in\pi_{1,2}}f(x)\right\}sansserif_ub = roman_min { sansserif_ub , sansserif_localsolver start_POSTSUBSCRIPT italic_x ∈ italic_π start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x ) } ;
       Update search space and δsubscript𝛿\delta_{\star}italic_δ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT: Π=Π{π,π0}{π1,π2}ΠΠ𝜋subscript𝜋0subscript𝜋1subscript𝜋2\Pi=\Pi\setminus\left\{\pi,\pi_{0}\right\}\cup\left\{\pi_{1},\pi_{2}\right\}roman_Π = roman_Π ∖ { italic_π , italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } ∪ { italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }, δ=1(1δ)(1δ)subscript𝛿11subscript𝛿1𝛿\delta_{\star}=1-(1-\delta_{\star})(1-\delta)italic_δ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = 1 - ( 1 - italic_δ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ( 1 - italic_δ );
      
/* A single region in Πnormal-Π\Piroman_Π remains */
Returns Π={π}Π𝜋\Pi=\left\{\pi\right\}roman_Π = { italic_π }, Cδ(fπ)subscript𝐶𝛿subscript𝑓𝜋C_{\delta}(f_{\pi})italic_C start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ), δsubscript𝛿\delta_{\star}italic_δ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT;
Algorithm 2 Splitting scheme with GloptiNets

A.3 Warm restarts

Our model distinguishes itself by leveraging the analytical properties of the objective function, rather than relying solely on algebraic characteristics. This approach offers a notable advantage, as closely related functions can naturally benefit from a warm restart. For example, if we already have a certificate for a function f𝑓fitalic_f using a PSD model g𝑔gitalic_g, and we seek to compute a certificate for a similar function f~f~𝑓𝑓\tilde{f}\approx fover~ start_ARG italic_f end_ARG ≈ italic_f, we can readily employ GloptiNets by initializing the PSD model with g𝑔gitalic_g. Indeed, if ffg𝑓subscript𝑓𝑔f-f_{\star}\approx gitalic_f - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ≈ italic_g, we can expect f~f~g~𝑓subscript~𝑓𝑔\tilde{f}-\tilde{f}_{\star}\approx gover~ start_ARG italic_f end_ARG - over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ≈ italic_g, so we can expect the optimization to be faster.

In contrast, P-SoS methods, which rely on SDP programs, cannot directly adapt to new problems without significant effort. For instance, if a new component is introduced, an entirely new SDP must be solved. Our model’s ability to accommodate related yet distinct problems could prove highly valuable in domains with a frequent need to certify different but closely related problems. In the industry, the Optimal Power Flow (OPF) problem requires periodic solves every 5 minutes Van Hentenryck [18]. With GloptiNets, once the initial challenging solve is performed, subsequent solves become easier assuming minimal changes in supply and demand conditions.

A.4 Optimizing the certificate directly

As explained in section 3.2 where GloptiNets is introduced, we optimize a proxy of the Lsubscript𝐿L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm rather than the certificate of theorems 2 and 3. This proxy is the log-sum-exp on a random batch of N𝑁Nitalic_N points. The reason for this is that evaluating an extended k-SoS model g(x)𝑔𝑥g(x)italic_g ( italic_x ) on x𝕋d𝑥superscript𝕋𝑑x\in\mathbb{T}^{d}italic_x ∈ blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT requires O(drs)𝑂𝑑𝑟𝑠O(drs)italic_O ( italic_d italic_r italic_s ) time, while evaluating g^ωsubscript^𝑔𝜔\widehat{g}_{\omega}over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT on ωd𝜔superscript𝑑\omega\in\mathbb{Z}^{d}italic_ω ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT requires O(drs2)𝑂𝑑𝑟superscript𝑠2O(drs^{2})italic_O ( italic_d italic_r italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) time. Yet, optimizing the certificate directly could probably help obtaining higher-precision certificate. Lemma 4 in appendix D sketches a method to reduce the computational cost of the Fourier computation from O(s2)𝑂superscript𝑠2O(s^{2})italic_O ( italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) to O(s)𝑂𝑠O(s)italic_O ( italic_s ).

Appendix B Kernel defined on the Chebychev basis

In this section we describe the approach we take to model functions written in the Chebychev basis. For hhitalic_h such a polynomial, a naive approach would simply model f=hcos(2π)f=h\circ\cos(2\pi\cdot)italic_f = italic_h ∘ roman_cos ( 2 italic_π ⋅ ) as a trigonometric polynomial. However, note that the decomposition of f𝑓fitalic_f only has cosine terms. Thus, approximating ff𝑓subscript𝑓f-f_{\star}italic_f - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT efficiently requires a PSD model which has only cosine terms in its Fourier decomposition. This is achieved by using a kernel written in the Chebychev basis, as introduce in proposition 1, for which we now provide a proof.

Proof of proposition 1..

Let x,y[1,1]𝑥𝑦11x,y\in[-1,1]italic_x , italic_y ∈ [ - 1 , 1 ] and u,v[0,1/2]𝑢𝑣012u,v\in[0,\nicefrac{{1}}{{2}}]italic_u , italic_v ∈ [ 0 , / start_ARG 1 end_ARG start_ARG 2 end_ARG ] s.t. x,y=cos(2πu),cos(2πv)formulae-sequence𝑥𝑦2𝜋𝑢2𝜋𝑣x,y=\cos(2\pi u),\cos(2\pi v)italic_x , italic_y = roman_cos ( 2 italic_π italic_u ) , roman_cos ( 2 italic_π italic_v ), by bijectivity of the cosine function on [0,π]0𝜋[0,\pi][ 0 , italic_π ]. From the definition of K𝐾Kitalic_K in eq. 19 and the definition of q𝑞qitalic_q in eq. 6, we have that

K(x,y)𝐾𝑥𝑦\displaystyle K(x,y)italic_K ( italic_x , italic_y ) =12ωq^ω(e2πiω(u+v)+e2πiω(uv))absent12subscript𝜔subscript^𝑞𝜔superscript𝑒2𝜋i𝜔𝑢𝑣superscript𝑒2𝜋i𝜔𝑢𝑣\displaystyle=\frac{1}{2}\sum_{\omega\in\mathbb{Z}}\widehat{q}_{\omega}\left(e% ^{2\pi\mathrm{i}\omega(u+v)}+e^{2\pi\mathrm{i}\omega(u-v)}\right)= divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_ω ∈ blackboard_Z end_POSTSUBSCRIPT over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_e start_POSTSUPERSCRIPT 2 italic_π roman_i italic_ω ( italic_u + italic_v ) end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT 2 italic_π roman_i italic_ω ( italic_u - italic_v ) end_POSTSUPERSCRIPT )
=ωq^ωe2πiωucos(2πωv)absentsubscript𝜔subscript^𝑞𝜔superscript𝑒2𝜋i𝜔𝑢2𝜋𝜔𝑣\displaystyle=\sum_{\omega\in\mathbb{Z}}\widehat{q}_{\omega}e^{2\pi\mathrm{i}% \omega u}\cos(2\pi\omega v)= ∑ start_POSTSUBSCRIPT italic_ω ∈ blackboard_Z end_POSTSUBSCRIPT over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT 2 italic_π roman_i italic_ω italic_u end_POSTSUPERSCRIPT roman_cos ( 2 italic_π italic_ω italic_v )
=q^0+2ωq^ωcos(2πωu)cos(2πωv)absentsubscript^𝑞02subscript𝜔subscript^𝑞𝜔2𝜋𝜔𝑢2𝜋𝜔𝑣\displaystyle=\widehat{q}_{0}+2\sum_{\omega\in\mathbb{N}}\widehat{q}_{\omega}% \cos(2\pi\omega u)\cos(2\pi\omega v)= over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 2 ∑ start_POSTSUBSCRIPT italic_ω ∈ blackboard_N end_POSTSUBSCRIPT over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT roman_cos ( 2 italic_π italic_ω italic_u ) roman_cos ( 2 italic_π italic_ω italic_v )
=q^0+2ωq^ωHω(u)Hω(v).absentsubscript^𝑞02subscript𝜔subscript^𝑞𝜔subscript𝐻𝜔𝑢subscript𝐻𝜔𝑣\displaystyle=\widehat{q}_{0}+2\sum_{\omega\in\mathbb{N}}\widehat{q}_{\omega}H% _{\omega}(u)H_{\omega}(v).= over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 2 ∑ start_POSTSUBSCRIPT italic_ω ∈ blackboard_N end_POSTSUBSCRIPT over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_u ) italic_H start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_v ) .

Since q𝑞qitalic_q has positive Fourier transform, this makes the feature map of K𝐾Kitalic_K explicit with K(x,y)=φ(u)φ(v)𝐾𝑥𝑦𝜑𝑢𝜑𝑣K(x,y)=\varphi(u)\cdot\varphi(v)italic_K ( italic_x , italic_y ) = italic_φ ( italic_u ) ⋅ italic_φ ( italic_v ), φ(u)ω=(1+𝟏ω0)q^ωHω(u)𝜑subscript𝑢𝜔1subscript1𝜔0subscript^𝑞𝜔subscript𝐻𝜔𝑢\varphi(u)_{\omega}=\sqrt{(1+\mathbf{1}_{\omega\neq 0})\widehat{q}_{\omega}}H_% {\omega}(u)italic_φ ( italic_u ) start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = square-root start_ARG ( 1 + bold_1 start_POSTSUBSCRIPT italic_ω ≠ 0 end_POSTSUBSCRIPT ) over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG italic_H start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_u ), for ω𝜔\omega\in\mathbb{N}italic_ω ∈ blackboard_N. Hence the kernel is a reproducing kernel. ∎

We now use this kernel with the Bessel function xes(cos(2πx)1)maps-to𝑥superscript𝑒𝑠2𝜋𝑥1x\mapsto e^{s(\cos(2\pi x)-1)}italic_x ↦ italic_e start_POSTSUPERSCRIPT italic_s ( roman_cos ( 2 italic_π italic_x ) - 1 ) end_POSTSUPERSCRIPT, i.e. we define the kernel K𝐾Kitalic_K on [1,1]11[-1,1][ - 1 , 1 ] to satisfy

u,v(0,1/2),K(cos(2πu),cos(2πv))=12(es(cos(2π(u+v))+es(cos(2π(uv))).\forall u,v\in(0,\nicefrac{{1}}{{2}}),~{}~{}K(\cos(2\pi u),\cos(2\pi v))=\frac% {1}{2}\left(e^{s(\cos(2\pi(u+v))}+e^{s(\cos(2\pi(u-v))}\right).∀ italic_u , italic_v ∈ ( 0 , / start_ARG 1 end_ARG start_ARG 2 end_ARG ) , italic_K ( roman_cos ( 2 italic_π italic_u ) , roman_cos ( 2 italic_π italic_v ) ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_e start_POSTSUPERSCRIPT italic_s ( roman_cos ( 2 italic_π ( italic_u + italic_v ) ) end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_s ( roman_cos ( 2 italic_π ( italic_u - italic_v ) ) end_POSTSUPERSCRIPT ) . (24)

As it was the case for the torus, this kernel enables an easy characterization of a RKHS in which an associated PSD model g𝑔gitalic_g lives.

Lemma 3 (Chebychev coefficient of the Bessel kernel).

Let g𝑔gitalic_g be a PSD model as in definition 1, with the kernel K𝐾Kitalic_K of eq. 24. Then, the Chebychev coefficient ωd𝜔superscript𝑑\omega\in\mathbb{N}^{d}italic_ω ∈ blackboard_N start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT of g𝑔gitalic_g can be computed in O(rdm2)𝑂𝑟𝑑superscript𝑚2O(rdm^{2})italic_O ( italic_r italic_d italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) time with

gω=i,j=1mRiRj=1d(1+𝟏ω0)e2𝐬2[Iω(2𝐬σij)Hω(σ+ij)+Iω(2𝐬σ+ij)Hω(σij)]g_{\omega}=\sum_{i,j=1}^{m}R_{i}^{\top}R_{j}\prod_{\ell=1}^{d}(1+\mathbf{1}_{% \omega\neq 0})\frac{e^{-2\mathbf{s}_{\ell}}}{2}\bigg{[}\begin{aligned} &I_{% \omega_{\ell}}(2\mathbf{s}_{\ell}\sigma_{-\ell ij})H_{\omega_{\ell}}(\sigma_{+% \ell ij})\\ +&I_{\omega_{\ell}}(2\mathbf{s}_{\ell}\sigma_{+\ell ij})H_{\omega_{\ell}}(% \sigma_{-\ell ij})\bigg{]}\end{aligned}italic_g start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ( 1 + bold_1 start_POSTSUBSCRIPT italic_ω ≠ 0 end_POSTSUBSCRIPT ) divide start_ARG italic_e start_POSTSUPERSCRIPT - 2 bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG [ start_ROW start_CELL end_CELL start_CELL italic_I start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 2 bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT - roman_ℓ italic_i italic_j end_POSTSUBSCRIPT ) italic_H start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_σ start_POSTSUBSCRIPT + roman_ℓ italic_i italic_j end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL + end_CELL start_CELL italic_I start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 2 bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT + roman_ℓ italic_i italic_j end_POSTSUBSCRIPT ) italic_H start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_σ start_POSTSUBSCRIPT - roman_ℓ italic_i italic_j end_POSTSUBSCRIPT ) ] end_CELL end_ROW (25)

where

σ±ij=cos(2πm±ij),m±ij=(𝐮ij±𝐮ij)/2,𝑎𝑛𝑑cos2π𝐮ij=𝐳ij.formulae-sequencesubscript𝜎plus-or-minus𝑖𝑗2𝜋subscript𝑚plus-or-minus𝑖𝑗formulae-sequencesubscript𝑚plus-or-minus𝑖𝑗plus-or-minussubscript𝐮𝑖𝑗subscript𝐮𝑖𝑗2𝑎𝑛𝑑2𝜋subscript𝐮𝑖𝑗subscript𝐳𝑖𝑗\sigma_{\pm\ell ij}=\cos(2\pi m_{\pm\ell ij}),~{}~{}m_{\pm\ell ij}=(\mathbf{u}% _{\ell ij}\pm\mathbf{u}_{\ell ij})/2,~{}~{}\text{and}~{}~{}\cos 2\pi\mathbf{u}% _{\ell ij}=\mathbf{z}_{\ell ij}.italic_σ start_POSTSUBSCRIPT ± roman_ℓ italic_i italic_j end_POSTSUBSCRIPT = roman_cos ( 2 italic_π italic_m start_POSTSUBSCRIPT ± roman_ℓ italic_i italic_j end_POSTSUBSCRIPT ) , italic_m start_POSTSUBSCRIPT ± roman_ℓ italic_i italic_j end_POSTSUBSCRIPT = ( bold_u start_POSTSUBSCRIPT roman_ℓ italic_i italic_j end_POSTSUBSCRIPT ± bold_u start_POSTSUBSCRIPT roman_ℓ italic_i italic_j end_POSTSUBSCRIPT ) / 2 , and roman_cos 2 italic_π bold_u start_POSTSUBSCRIPT roman_ℓ italic_i italic_j end_POSTSUBSCRIPT = bold_z start_POSTSUBSCRIPT roman_ℓ italic_i italic_j end_POSTSUBSCRIPT .
Proof.

Expanding g𝑔gitalic_g and definition of Chebychev coefficient.

From the definition of g𝑔gitalic_g in eq. 5, we have

g(𝐱)=i,j=1mRiRj=1dK𝐬(𝐱,𝐳i)K𝐬(𝐱,𝐳j).𝑔𝐱superscriptsubscript𝑖𝑗1𝑚superscriptsubscript𝑅𝑖topsubscript𝑅𝑗superscriptsubscriptproduct1𝑑subscript𝐾subscript𝐬subscript𝐱subscript𝐳𝑖subscript𝐾subscript𝐬subscript𝐱subscript𝐳𝑗g(\mathbf{x})=\sum_{i,j=1}^{m}R_{i}^{\top}R_{j}\prod_{\ell=1}^{d}K_{\mathbf{s}% _{\ell}}(\mathbf{x}_{\ell},\mathbf{z}_{\ell i})K_{\mathbf{s}_{\ell}}(\mathbf{x% }_{\ell},\mathbf{z}_{\ell j}).italic_g ( bold_x ) = ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , bold_z start_POSTSUBSCRIPT roman_ℓ italic_i end_POSTSUBSCRIPT ) italic_K start_POSTSUBSCRIPT bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , bold_z start_POSTSUBSCRIPT roman_ℓ italic_j end_POSTSUBSCRIPT ) . (26)

We consider x,y,z(1,1)𝑥𝑦𝑧11x,y,z\in(-1,1)italic_x , italic_y , italic_z ∈ ( - 1 , 1 ) and s>0𝑠0s>0italic_s > 0. We denote u,v,w(0,1/2)𝑢𝑣𝑤012u,v,w\in(0,1/2)italic_u , italic_v , italic_w ∈ ( 0 , 1 / 2 ) s.t.

x,y,z=cos2πu,cos2πv,cos2πwformulae-sequence𝑥𝑦𝑧2𝜋𝑢2𝜋𝑣2𝜋𝑤x,y,z=\cos 2\pi u,\cos 2\pi v,\cos 2\pi witalic_x , italic_y , italic_z = roman_cos 2 italic_π italic_u , roman_cos 2 italic_π italic_v , roman_cos 2 italic_π italic_w

with the bijectivity of xcos(2πx)maps-to𝑥2𝜋𝑥x\mapsto\cos(2\pi x)italic_x ↦ roman_cos ( 2 italic_π italic_x ) on (0,1/2)012(0,1/2)( 0 , 1 / 2 ). We now compute the Chebychev coefficient of xKs(x,y)Ks(x,z)maps-to𝑥subscript𝐾𝑠𝑥𝑦subscript𝐾𝑠𝑥𝑧x\mapsto K_{s}(x,y)K_{s}(x,z)italic_x ↦ italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x , italic_y ) italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x , italic_z ). Denoted pωsubscript𝑝𝜔p_{\omega}italic_p start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT, this is

ω,pω=1+𝟏ω0π11Ks(x,y)Ks(x,z)Tω(x)dx1x2,formulae-sequencefor-all𝜔subscript𝑝𝜔1subscript1𝜔0𝜋superscriptsubscript11subscript𝐾𝑠𝑥𝑦subscript𝐾𝑠𝑥𝑧subscript𝑇𝜔𝑥d𝑥1superscript𝑥2\forall\omega\in\mathbb{N},~{}~{}p_{\omega}=\frac{1+\mathbf{1}_{\omega\neq 0}}% {\pi}\int_{-1}^{1}K_{s}(x,y)K_{s}(x,z)T_{\omega}(x)\frac{\mathrm{d}x}{\sqrt{1-% x^{2}}},∀ italic_ω ∈ blackboard_N , italic_p start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = divide start_ARG 1 + bold_1 start_POSTSUBSCRIPT italic_ω ≠ 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_π end_ARG ∫ start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x , italic_y ) italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x , italic_z ) italic_T start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_x ) divide start_ARG roman_d italic_x end_ARG start_ARG square-root start_ARG 1 - italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG ,

or equivalently

ω,pω=(1+𝟏ω0)01Ks(cos2πu,cos2πv)Ks(cos2πu,cos2πw)cos(2πωu)du.formulae-sequencefor-all𝜔subscript𝑝𝜔1subscript1𝜔0superscriptsubscript01subscript𝐾𝑠2𝜋𝑢2𝜋𝑣subscript𝐾𝑠2𝜋𝑢2𝜋𝑤2𝜋𝜔𝑢differential-d𝑢\forall\omega\in\mathbb{N},~{}~{}p_{\omega}=(1+\mathbf{1}_{\omega\neq 0})\int_% {0}^{1}K_{s}(\cos 2\pi u,\cos 2\pi v)K_{s}(\cos 2\pi u,\cos 2\pi w)\cos(2\pi% \omega u)\mathrm{d}u.∀ italic_ω ∈ blackboard_N , italic_p start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = ( 1 + bold_1 start_POSTSUBSCRIPT italic_ω ≠ 0 end_POSTSUBSCRIPT ) ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( roman_cos 2 italic_π italic_u , roman_cos 2 italic_π italic_v ) italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( roman_cos 2 italic_π italic_u , roman_cos 2 italic_π italic_w ) roman_cos ( 2 italic_π italic_ω italic_u ) roman_d italic_u . (27)

Chebychev coefficient of kernel product.

With the definition of the kernel in proposition 1, eq. 19, we have

Ks(x,y)Ks(x,z)subscript𝐾𝑠𝑥𝑦subscript𝐾𝑠𝑥𝑧\displaystyle K_{s}(x,y)K_{s}(x,z)italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x , italic_y ) italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x , italic_z ) =14(p(u+v)+p(uv))×(p(u+w)+p(uw))absent14𝑝𝑢𝑣𝑝𝑢𝑣𝑝𝑢𝑤𝑝𝑢𝑤\displaystyle=\frac{1}{4}\left(p(u+v)+p(u-v)\right)\times\left(p(u+w)+p(u-w)\right)= divide start_ARG 1 end_ARG start_ARG 4 end_ARG ( italic_p ( italic_u + italic_v ) + italic_p ( italic_u - italic_v ) ) × ( italic_p ( italic_u + italic_w ) + italic_p ( italic_u - italic_w ) )
=e2s4(escos2π(u+v)+escos2π(uv))×(escos2π(u+w)+escos2π(uw))absentsuperscript𝑒2𝑠4superscript𝑒𝑠2𝜋𝑢𝑣superscript𝑒𝑠2𝜋𝑢𝑣superscript𝑒𝑠2𝜋𝑢𝑤superscript𝑒𝑠2𝜋𝑢𝑤\displaystyle=\frac{e^{-2s}}{4}\left(e^{s\cos 2\pi(u+v)}+e^{s\cos 2\pi(u-v)}% \right)\times\left(e^{s\cos 2\pi(u+w)}+e^{s\cos 2\pi(u-w)}\right)= divide start_ARG italic_e start_POSTSUPERSCRIPT - 2 italic_s end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG ( italic_e start_POSTSUPERSCRIPT italic_s roman_cos 2 italic_π ( italic_u + italic_v ) end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_s roman_cos 2 italic_π ( italic_u - italic_v ) end_POSTSUPERSCRIPT ) × ( italic_e start_POSTSUPERSCRIPT italic_s roman_cos 2 italic_π ( italic_u + italic_w ) end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_s roman_cos 2 italic_π ( italic_u - italic_w ) end_POSTSUPERSCRIPT )

Now use the sum-to-product formula with the cosines to obtain

Ks(x,y)Ks(x,z)=e2s4(e2scos2π(vw2)cos2π(u+v+w2)+e2scos2π(vw2)cos2π(uv+w2)+e2scos2π(v+w2)cos2π(u+vw2)+e2scos2π(v+w2)cos2π(uvw2)),K_{s}(x,y)K_{s}(x,z)=\frac{e^{-2s}}{4}\Biggl{(}\begin{aligned} &e^{2s\cos 2\pi% (\frac{v-w}{2})\cos 2\pi(u+\frac{v+w}{2})}+e^{2s\cos 2\pi(\frac{v-w}{2})\cos 2% \pi(u-\frac{v+w}{2})}\\ +&e^{2s\cos 2\pi(\frac{v+w}{2})\cos 2\pi(u+\frac{v-w}{2})}+e^{2s\cos 2\pi(% \frac{v+w}{2})\cos 2\pi(u-\frac{v-w}{2})}\Biggr{)},\end{aligned}italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x , italic_y ) italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x , italic_z ) = divide start_ARG italic_e start_POSTSUPERSCRIPT - 2 italic_s end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG ( start_ROW start_CELL end_CELL start_CELL italic_e start_POSTSUPERSCRIPT 2 italic_s roman_cos 2 italic_π ( divide start_ARG italic_v - italic_w end_ARG start_ARG 2 end_ARG ) roman_cos 2 italic_π ( italic_u + divide start_ARG italic_v + italic_w end_ARG start_ARG 2 end_ARG ) end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT 2 italic_s roman_cos 2 italic_π ( divide start_ARG italic_v - italic_w end_ARG start_ARG 2 end_ARG ) roman_cos 2 italic_π ( italic_u - divide start_ARG italic_v + italic_w end_ARG start_ARG 2 end_ARG ) end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL + end_CELL start_CELL italic_e start_POSTSUPERSCRIPT 2 italic_s roman_cos 2 italic_π ( divide start_ARG italic_v + italic_w end_ARG start_ARG 2 end_ARG ) roman_cos 2 italic_π ( italic_u + divide start_ARG italic_v - italic_w end_ARG start_ARG 2 end_ARG ) end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT 2 italic_s roman_cos 2 italic_π ( divide start_ARG italic_v + italic_w end_ARG start_ARG 2 end_ARG ) roman_cos 2 italic_π ( italic_u - divide start_ARG italic_v - italic_w end_ARG start_ARG 2 end_ARG ) end_POSTSUPERSCRIPT ) , end_CELL end_ROW (28)

We simplify this expression by introducing

m±=12(v±w)andσ±=cos2πm±.subscript𝑚plus-or-minus12plus-or-minus𝑣𝑤andsubscript𝜎plus-or-minus2𝜋subscript𝑚plus-or-minusm_{\pm}=\frac{1}{2}(v\pm w)~{}~{}\text{and}~{}~{}\sigma_{\pm}=\cos 2\pi m_{\pm}.italic_m start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_v ± italic_w ) and italic_σ start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT = roman_cos 2 italic_π italic_m start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT . (29)

Then, eq. 28 becomes

Ks(x,y)Ks(x,z)=e2s4(e2sσcos2π(u+m+)+e2sσcos2π(um+)+e2sσ+cos2π(u+m)+e2sσ+cos2π(um)).K_{s}(x,y)K_{s}(x,z)=\frac{e^{-2s}}{4}\Biggl{(}\begin{aligned} &e^{2s\sigma_{-% }\cos 2\pi(u+m_{+})}+e^{2s\sigma_{-}\cos 2\pi(u-m_{+})}\\ +&e^{2s\sigma_{+}\cos 2\pi(u+m_{-})}+e^{2s\sigma_{+}\cos 2\pi(u-m_{-})}\Biggr{% )}.\end{aligned}italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x , italic_y ) italic_K start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x , italic_z ) = divide start_ARG italic_e start_POSTSUPERSCRIPT - 2 italic_s end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG ( start_ROW start_CELL end_CELL start_CELL italic_e start_POSTSUPERSCRIPT 2 italic_s italic_σ start_POSTSUBSCRIPT - end_POSTSUBSCRIPT roman_cos 2 italic_π ( italic_u + italic_m start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT 2 italic_s italic_σ start_POSTSUBSCRIPT - end_POSTSUBSCRIPT roman_cos 2 italic_π ( italic_u - italic_m start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL + end_CELL start_CELL italic_e start_POSTSUPERSCRIPT 2 italic_s italic_σ start_POSTSUBSCRIPT + end_POSTSUBSCRIPT roman_cos 2 italic_π ( italic_u + italic_m start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT 2 italic_s italic_σ start_POSTSUBSCRIPT + end_POSTSUBSCRIPT roman_cos 2 italic_π ( italic_u - italic_m start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ) . end_CELL end_ROW (30)

We recognize the definition of the kernel (which is not a surprise as we chose the kernel to be stable by product). However, we need variables in (0,1/2)012(0,1/2)( 0 , 1 / 2 ) to retrieve the proper definition of the kernel. Instead, we use lemma 5 on eq. 30 combined with eq. 27, to obtain

pω=(1+𝟏ω0)e2s4(cos(2πωm+)Iω(2sσ)+cos(2πωm+)Iω(2sσ)+cos(2πωm)Iω(2sσ+)+cos(2πωm)Iω(2sσ+)),p_{\omega}=(1+\mathbf{1}_{\omega\neq 0})\frac{e^{-2s}}{4}\Biggl{(}\begin{% aligned} &\cos(2\pi\omega m_{+})I_{\omega}(2s\sigma_{-})+\cos(2\pi\omega m_{+}% )I_{\omega}(2s\sigma_{-})\\ +&\cos(2\pi\omega m_{-})I_{\omega}(2s\sigma_{+})+\cos(2\pi\omega m_{-})I_{% \omega}(2s\sigma_{+})\Biggr{)},\end{aligned}italic_p start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = ( 1 + bold_1 start_POSTSUBSCRIPT italic_ω ≠ 0 end_POSTSUBSCRIPT ) divide start_ARG italic_e start_POSTSUPERSCRIPT - 2 italic_s end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG ( start_ROW start_CELL end_CELL start_CELL roman_cos ( 2 italic_π italic_ω italic_m start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ) italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( 2 italic_s italic_σ start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ) + roman_cos ( 2 italic_π italic_ω italic_m start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ) italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( 2 italic_s italic_σ start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL + end_CELL start_CELL roman_cos ( 2 italic_π italic_ω italic_m start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ) italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( 2 italic_s italic_σ start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ) + roman_cos ( 2 italic_π italic_ω italic_m start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ) italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( 2 italic_s italic_σ start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ) ) , end_CELL end_ROW

which gives

pω=(1+𝟏ω0)e2s2(cos(2πωm+)Iω(2sσ)+cos(2πωm)Iω(2sσ+)).subscript𝑝𝜔1subscript1𝜔0superscript𝑒2𝑠22𝜋𝜔subscript𝑚subscript𝐼𝜔2𝑠subscript𝜎2𝜋𝜔subscript𝑚subscript𝐼𝜔2𝑠subscript𝜎p_{\omega}=(1+\mathbf{1}_{\omega\neq 0})\frac{e^{-2s}}{2}(\cos(2\pi\omega m_{+% })I_{\omega}(2s\sigma_{-})+\cos(2\pi\omega m_{-})I_{\omega}(2s\sigma_{+})).italic_p start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = ( 1 + bold_1 start_POSTSUBSCRIPT italic_ω ≠ 0 end_POSTSUBSCRIPT ) divide start_ARG italic_e start_POSTSUPERSCRIPT - 2 italic_s end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ( roman_cos ( 2 italic_π italic_ω italic_m start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ) italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( 2 italic_s italic_σ start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ) + roman_cos ( 2 italic_π italic_ω italic_m start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ) italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( 2 italic_s italic_σ start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ) ) . (31)

Equation 31 contains the Chebychev coefficient of the product of two kernel function as defined in eq. 27. Plugging this result into the definition of g𝑔gitalic_g in eq. 26, and noting that cos(2πωm±)=Hω(cos2πm±)=Hω(σ±)2𝜋𝜔subscript𝑚plus-or-minussubscript𝐻𝜔2𝜋subscript𝑚plus-or-minussubscript𝐻𝜔subscript𝜎plus-or-minus\cos(2\pi\omega m_{\pm})=H_{\omega}(\cos 2\pi m_{\pm})=H_{\omega}(\sigma_{\pm})roman_cos ( 2 italic_π italic_ω italic_m start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT ) = italic_H start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( roman_cos 2 italic_π italic_m start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT ) = italic_H start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_σ start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT ), we obtain the result. ∎

Thanks to lemma 3, we see that a model g𝑔gitalic_g defined as in definition 1 with the Bessel kernel K𝐬subscript𝐾𝐬K_{\mathbf{s}}italic_K start_POSTSUBSCRIPT bold_s end_POSTSUBSCRIPT of eq. 24 as its Chebychev coefficients decaying in O(Iω(2s))𝑂subscript𝐼𝜔2𝑠O(I_{\omega}(2s))italic_O ( italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( 2 italic_s ) ). Hence, it belongs to 2𝐬subscript2𝐬\mathcal{H}_{2\mathbf{s}}caligraphic_H start_POSTSUBSCRIPT 2 bold_s end_POSTSUBSCRIPT, the RKHS associated to K2𝐬subscript𝐾2𝐬K_{2\mathbf{s}}italic_K start_POSTSUBSCRIPT 2 bold_s end_POSTSUBSCRIPT.

Appendix C Additional details on the experiments

Tuning the hyperparameters.

The time reported in section 4 does not take into account the experiments needed to find a good set of hyperparameters. The parameters tuned were the type of optimizer, the decay of learning rate, and the regularization on the Frobenius norm of G𝐺Gitalic_G.

Regularization.

Regularization is performed by approximating the HS𝐻𝑆HSitalic_H italic_S norm with a proxy which is faster to compute. We use RjRkHS2superscriptsubscriptdelimited-∥∥superscriptsubscript𝑅𝑗topsubscript𝑅𝑘𝐻𝑆2\lVert R_{j}^{\top}R_{k}\rVert_{HS}^{2}∥ italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_H italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT instead of R~jQjkR~kHS2superscriptsubscriptdelimited-∥∥superscriptsubscript~𝑅𝑗topsubscript𝑄𝑗𝑘subscript~𝑅𝑘𝐻𝑆2\lVert\tilde{R}_{j}^{\top}Q_{jk}\tilde{R}_{k}\rVert_{HS}^{2}∥ over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT over~ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_H italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in eq. 22.

Hardware.

GloptiNets was used with NVIDIA V100 GPUs for the interpolation part, and Intel Xeon CPU E5-2698 v4 @ 2.20GHz for computing the certificate. TSSOS was run on a Apple M1 chip with Mosek solver.

Configuration of TSSOS.

We use the lowest possible relaxation order d𝑑ditalic_d (i.e.degf/2deg𝑓2\lceil\mathrm{deg}~{}f/2\rceil⌈ roman_deg italic_f / 2 ⌉), along with Chordal sparsity. We use the first relaxation step of the hierarchy. In these settings, TSSOS is not guaranteed to converge to fsubscript𝑓f_{\star}italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT but will executes the fastest.

Certificate vs. number of parameter for a given function.

In fig. 2, the target function is a random polynomial of norm 1111 or 2222, or a kernel mixture with 10101010 coefficients of norm 1111 or 2222. The models forming the blue line are defined as in proposition 2, with rank, block size and number of blocks equal to (1,bs,1)1𝑏𝑠1(1,bs,1)( 1 , italic_b italic_s , 1 ) respectively, with bs𝑏𝑠bsitalic_b italic_s the block size we vary. The number of frequencies sampled to compute the certificate is 1.61071.6superscript1071.6\cdot 10^{7}1.6 ⋅ 10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT, and accounts for the fact that the bound on the variance becomes larger than the MOM estimator for large models.

Certificate vs. problem difficulty for a given model.

We have 3 related parameters: the quality of the optimization (given by the certificate), the expressivity of the model (given by its number of parameters), and the difficulty of the optimization (given by the norm of the function). In fig. 3, we fix the latter and plot the relation between the first two. Here, we fix the model with parameters (8,16,128)816128(8,16,128)( 8 , 16 , 128 ), and we optimize a polynomial in 3d3𝑑3d3 italic_d of degree 12121212, with RKHS norm ranging from 1111 to 20202020. The certificates obtained are given in fig. 3. The resulting plot exhibits a clear polynomial relation between the certificate and the norm of the function, with a slope of 0.880.88-0.88- 0.88. This suggest that the certificate behaves as O(f2𝐬1/2)𝑂superscriptsubscriptdelimited-∥∥𝑓subscript2𝐬12O(\lVert f\rVert_{\mathcal{H}_{2\mathbf{s}}}^{\nicefrac{{1}}{{2}}})italic_O ( ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT 2 bold_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ).


1234567891020101.5superscript101.510^{-1.5}10 start_POSTSUPERSCRIPT - 1.5 end_POSTSUPERSCRIPT101.2superscript101.210^{-1.2}10 start_POSTSUPERSCRIPT - 1.2 end_POSTSUPERSCRIPTf2superscriptdelimited-∥∥𝑓2\left\lVert f\right\rVert^{2}∥ italic_f ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPTCertificate
Figure 3: Certificate vs. RKHS norm of f𝑓fitalic_f, for a given model g𝑔gitalic_g with a fixed number of parameters. f𝑓fitalic_f has 1146 coefficients and g𝑔gitalic_g has 22528 parameters. Best certificate is kept among a set of optimization hyperparameters. As the norm of f𝑓fitalic_f decreases, fitting ff𝑓subscript𝑓f-f_{\star}italic_f - italic_f start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT with g𝑔gitalic_g is easier and the certificate becomes tighter.

Comparison with TSSOS on the Fourier basis.

In table 1, the polynomials f𝑓fitalic_f all have a RKHS norm of 1111. The small model is defined as in proposition 2, with rank, block size and number of blocks equal to 4,32,843284,32,84 , 32 , 8 respectively. For the big models, those values are 8,128,168128168,128,168 , 128 , 16. The certificate is the maximum of the Chebychev bound of theorem 2 and the MoM bound of theorem 3. The number of frequencies sampled is 3.21073.2superscript1073.2\cdot 10^{7}3.2 ⋅ 10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT.

Comparison with TSSOS on the Chebychev basis.

We compare GloptiNets with TSSOS on random Chebychev polynomials in table 2, similarly to the comparison with trigonometric polynomials in table 1. Minimizing polynomials defined on the canonical basis is easier: contrary to trigonometric polynomials, there is no need to account for the imaginary part of the variable. If d𝑑ditalic_d is the dimension, complex polynomials are encoded in a variable of dimension 2d2𝑑2d2 italic_d in TSSOS, following the definition of Hermitian Sum-of-Squares introduced in Josz and Molzahn [2018]. Hence, the random polynomials we consider are characterized by the dimension d𝑑ditalic_d and their number of coefficients n𝑛nitalic_n; instead of bounding the degree, we use all the basis elements Hω(𝐱)==1dHω(𝐱)subscript𝐻𝜔𝐱superscriptsubscriptproduct1𝑑subscript𝐻subscript𝜔subscript𝐱H_{\omega}(\mathbf{x})=\prod_{\ell=1}^{d}H_{\omega_{\ell}}(\mathbf{x}_{\ell})italic_H start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( bold_x ) = ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) for which ωpsubscriptdelimited-∥∥𝜔𝑝\left\lVert\omega\right\rVert_{\infty}\leq p∥ italic_ω ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_p. The maximum degree is then dp𝑑𝑝dpitalic_d italic_p. The RKHS norm of f𝑓fitalic_f is fixed to 1111. As with the comparison on Trigonometric polynomial table 1, we see that GloptiNets provides similar certificates no matter the number of coefficients in f𝑓fitalic_f. Even though it lags behind TSSOS for small polynomials, it handles large polynomials which are intractable to TSSOS. The “small” and “big” models have the same structure as for the trigonometric polynomials experiments.

Table 2: GloptiNets and TSSOS on random Chebychev polynomials. The same conclusion as in table 1 applies. While TSSOS is very efficient on small problems, its memory requirements grow exponentially with the problem size. GloptiNets has less accuracy, but a computational burden which does not increase with the problem size.
d𝑑ditalic_d p𝑝pitalic_p n𝑛nitalic_n TSSOS GN-small GN-big
Certif. t𝑡titalic_t Certif. t𝑡titalic_t Certif. t𝑡titalic_t
4444 3333 255255255255 3.41073.4superscript1073.4\cdot 10^{-7}3.4 ⋅ 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT 6666 1.11021.1superscript1021.1\cdot 10^{-2}1.1 ⋅ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 21022superscript1022\cdot 10^{2}2 ⋅ 10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 4.11034.1superscript1034.1\cdot 10^{-3}4.1 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 11031superscript1031\cdot 10^{3}1 ⋅ 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
4444 624624624624 2.11092.1superscript1092.1\cdot 10^{-9}2.1 ⋅ 10 start_POSTSUPERSCRIPT - 9 end_POSTSUPERSCRIPT 153153153153 2.51022.5superscript1022.5\cdot 10^{-2}2.5 ⋅ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 21022superscript1022\cdot 10^{2}2 ⋅ 10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 3.61033.6superscript1033.6\cdot 10^{-3}3.6 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 11031superscript1031\cdot 10^{3}1 ⋅ 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT
5555 1295129512951295 Out of memory! - 1.81021.8superscript1021.8\cdot 10^{-2}1.8 ⋅ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 21022superscript1022\cdot 10^{2}2 ⋅ 10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 4.21034.2superscript1034.2\cdot 10^{-3}4.2 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 21032superscript1032\cdot 10^{3}2 ⋅ 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT

Sampling from the Bessel distribution.

The function ωesIω(s)maps-to𝜔superscript𝑒𝑠subscript𝐼𝜔𝑠\omega\mapsto e^{-s}I_{\omega}(s)italic_ω ↦ italic_e start_POSTSUPERSCRIPT - italic_s end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_s ) decays rapidly. In fact, with s=2𝑠2s=2italic_s = 2, which is the value used to generate the random polynomials, it falls under machine precision as soon as ω>14𝜔14\omega>14italic_ω > 14. Thus, we approximate the distribution with a discrete one with weights Iω(s)subscript𝐼𝜔𝑠I_{\omega}(s)italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_s ) for ω𝜔\omegaitalic_ω s.t. the result is above the machine precision. We then extend it to multiple dimension with a tensor product. Finally, we use a hash table to store the already sampled frequency, to make the evaluation of million of frequencies much faster. For instance in dimension 5555, sampling 106superscript10610^{6}10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT frequencies from the Bessel distribution of parameter s=2𝑠2s=2italic_s = 2 on 5superscript5\mathbb{N}^{5}blackboard_N start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT yields only 104absentsuperscript104\approx 10^{4}≈ 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT unique frequencies. This allows for tighter certificates, as it makes the r.h.s of eq. 9, in 1/N1𝑁\nicefrac{{1}}{{N}}/ start_ARG 1 end_ARG start_ARG italic_N end_ARG, much smaller. Note that the time to generate this hash table is not reported in tables 1 and 2, and of the order of a few seconds.

Optimizing a kernel mixture.

As it is the case with polynomials, when optimizing a function of the form h(x)=i=1nαiK(xi,x)𝑥superscriptsubscript𝑖1𝑛subscript𝛼𝑖𝐾subscript𝑥𝑖𝑥h(x)=\sum_{i=1}^{n}\alpha_{i}K(x_{i},x)italic_h ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_K ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x ) the certificate provided by GloptiNets only depends on the function norm h2superscriptsubscriptdelimited-∥∥2\left\lVert h\right\rVert_{\mathcal{H}}^{2}∥ italic_h ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and not on e.g. the number of coefficients n𝑛nitalic_n. This is illustrated in fig. 4.

101superscript10110^{1}10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT102superscript10210^{2}10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT103superscript10310^{3}10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT101superscript10110^{-1}10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT100superscript10010^{0}10 start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT# paramsCertificateh=1subscriptdelimited-∥∥1\left\lVert h\right\rVert_{\mathcal{H}}=1∥ italic_h ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT = 1, n=10𝑛10n=10italic_n = 10h=1subscriptdelimited-∥∥1\left\lVert h\right\rVert_{\mathcal{H}}=1∥ italic_h ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT = 1, n=100𝑛100n=100italic_n = 100h=2subscriptdelimited-∥∥2\left\lVert h\right\rVert_{\mathcal{H}}=2∥ italic_h ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT = 2, n=10𝑛10n=10italic_n = 10h=2subscriptdelimited-∥∥2\left\lVert h\right\rVert_{\mathcal{H}}=2∥ italic_h ∥ start_POSTSUBSCRIPT caligraphic_H end_POSTSUBSCRIPT = 2, n=100𝑛100n=100italic_n = 100
Figure 4: Certificate vs. number of parameters in g𝑔gitalic_g when certifying mixture of Bessel functions, characterized by their RKHS norm (1111 in blue, 2222 in red) and their number of coefficients (10101010 in circles, 100100100100 in rectangles). As with polynomials, this shows that GloptiNets is only sensible to the former, and not to the way the function is represented. We are not aware of other algorithms able to certify this class of functions.

Appendix D Fourier coefficients in linear time

Lemma 4 (Fourier coefficient of the Bessel kernel in linear time).

Let g𝑔gitalic_g be an extended k-SoS model as in definition 1. Then, its Fourier transform can be evaluated in linear time in m𝑚mitalic_m with

g^ω=k=1rnd(i=1mRki=1dϕ,(𝐳i)n)(i=1mRki=1dϕ,+(𝐳i))subscript^𝑔𝜔superscriptsubscript𝑘1𝑟subscript𝑛superscript𝑑superscriptsubscript𝑖1𝑚subscript𝑅𝑘𝑖superscriptsubscriptproduct1𝑑subscriptitalic-ϕsubscriptsubscript𝐳𝑖subscript𝑛superscriptsubscript𝑖1𝑚subscript𝑅𝑘𝑖superscriptsubscriptproduct1𝑑subscriptitalic-ϕsubscript𝐳𝑖\widehat{g}_{\omega}=\sum_{k=1}^{r}\sum_{n\in\mathbb{Z}^{d}}\left(\sum_{i=1}^{% m}R_{ki}\prod_{\ell=1}^{d}\phi_{\ell,-}(\mathbf{z}_{i\ell})_{n_{\ell}}\right)% \cdot\left(\sum_{i=1}^{m}R_{ki}\prod_{\ell=1}^{d}\phi_{\ell,+}(\mathbf{z}_{i% \ell})\right)over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT roman_ℓ , - end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i roman_ℓ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ⋅ ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT roman_ℓ , + end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i roman_ℓ end_POSTSUBSCRIPT ) ) (32)

where

n,z𝕋,[d],ϕ,±(z)n=q,neπi(n±ω)zformulae-sequencefor-all𝑛formulae-sequence𝑧𝕋formulae-sequencedelimited-[]𝑑subscriptitalic-ϕplus-or-minussubscript𝑧𝑛subscript𝑞𝑛superscript𝑒𝜋iplus-or-minus𝑛subscript𝜔𝑧\forall n\in\mathbb{Z},z\in\mathbb{T},\ell\in[d],~{}~{}\phi_{\ell,\pm}(z)_{n}=% \sqrt{q_{\ell,n}}e^{\pi\mathrm{i}(n\pm\omega_{\ell})z}∀ italic_n ∈ blackboard_Z , italic_z ∈ blackboard_T , roman_ℓ ∈ [ italic_d ] , italic_ϕ start_POSTSUBSCRIPT roman_ℓ , ± end_POSTSUBSCRIPT ( italic_z ) start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = square-root start_ARG italic_q start_POSTSUBSCRIPT roman_ℓ , italic_n end_POSTSUBSCRIPT end_ARG italic_e start_POSTSUPERSCRIPT italic_π roman_i ( italic_n ± italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) italic_z end_POSTSUPERSCRIPT

and q,(s)subscript𝑞normal-⋅normal-⋅𝑠q_{\cdot,\cdot}(s)italic_q start_POSTSUBSCRIPT ⋅ , ⋅ end_POSTSUBSCRIPT ( italic_s ) is defined with lemma 6.

Lemma 4 provides a formula for computing g^ωsubscript^𝑔𝜔\widehat{g}_{\omega}over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT which is linear in m𝑚mitalic_m, but which still requires numerical approximation to compute the sum on nd𝑛superscript𝑑n\in\mathbb{Z}^{d}italic_n ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. For instance, restraining the sum to the hyperbolic cross Dũng et al. [2017]

HC(d,n)={ωd;=1dmax{1,|ω|}n}HC𝑑𝑛formulae-sequence𝜔superscript𝑑superscriptsubscriptproduct1𝑑1subscript𝜔𝑛\mathrm{HC}(d,n)=\left\{\omega\in\mathbb{Z}^{d};\prod_{\ell=1}^{d}\max\left\{1% ,|\omega_{\ell}|\right\}\leq n\right\}roman_HC ( italic_d , italic_n ) = { italic_ω ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ; ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_max { 1 , | italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | } ≤ italic_n }

would result in a complexity of O(n(logd)nmr)𝑂𝑛superscript𝑑𝑛𝑚𝑟O(n(\log d)^{n}mr)italic_O ( italic_n ( roman_log italic_d ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_m italic_r ) and should produce reasonably accurate estimate of g^ωsubscript^𝑔𝜔\widehat{g}_{\omega}over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT for low n𝑛nitalic_n.

Furthermore, since q𝑞qitalic_q is real-even w.r.t n𝑛nitalic_n, the inner-product in eq. 36 can be simplified by computing only half of the terms.

Proof.

From lemma 1, we have that

g^ω=i,j=1mRiRj=1de2𝐬I|ω|(2𝐬cosπ(𝐳i𝐳j))eiπω(𝐳i+𝐳j).subscript^𝑔𝜔superscriptsubscript𝑖𝑗1𝑚superscriptsubscript𝑅𝑖topsubscript𝑅𝑗superscriptsubscriptproduct1𝑑superscript𝑒2subscript𝐬subscript𝐼subscript𝜔2subscript𝐬𝜋subscript𝐳𝑖subscript𝐳𝑗superscript𝑒i𝜋subscript𝜔subscript𝐳𝑖subscript𝐳𝑗\widehat{g}_{\omega}=\sum_{i,j=1}^{m}R_{i}^{\top}R_{j}\prod_{\ell=1}^{d}e^{-2% \mathbf{s}_{\ell}}I_{|\omega_{\ell}|}(2\mathbf{s}_{\ell}\cos\pi(\mathbf{z}_{i% \ell}-\mathbf{z}_{j\ell}))e^{-\mathrm{i}\pi\omega_{\ell}(\mathbf{z}_{i\ell}+% \mathbf{z}_{j\ell})}.over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - 2 bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT | italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | end_POSTSUBSCRIPT ( 2 bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT roman_cos italic_π ( bold_z start_POSTSUBSCRIPT italic_i roman_ℓ end_POSTSUBSCRIPT - bold_z start_POSTSUBSCRIPT italic_j roman_ℓ end_POSTSUBSCRIPT ) ) italic_e start_POSTSUPERSCRIPT - roman_i italic_π italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i roman_ℓ end_POSTSUBSCRIPT + bold_z start_POSTSUBSCRIPT italic_j roman_ℓ end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT . (33)

Introducing

f(x,y)=e2𝐬I|ω|(2𝐬cosπ(xy))eiπω(x+y),subscript𝑓𝑥𝑦superscript𝑒2subscript𝐬subscript𝐼subscript𝜔2subscript𝐬𝜋𝑥𝑦superscript𝑒i𝜋subscript𝜔𝑥𝑦f_{\ell}(x,y)=e^{-2\mathbf{s}_{\ell}}I_{|\omega_{\ell}|}(2\mathbf{s}_{\ell}% \cos\pi(x-y))e^{-\mathrm{i}\pi\omega_{\ell}(x+y)},italic_f start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x , italic_y ) = italic_e start_POSTSUPERSCRIPT - 2 bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT | italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | end_POSTSUBSCRIPT ( 2 bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT roman_cos italic_π ( italic_x - italic_y ) ) italic_e start_POSTSUPERSCRIPT - roman_i italic_π italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x + italic_y ) end_POSTSUPERSCRIPT , (34)

eq. 33 simplifies to

g^ω=i,j=1mRiRj=1df(𝐳i,𝐳j).subscript^𝑔𝜔superscriptsubscript𝑖𝑗1𝑚superscriptsubscript𝑅𝑖topsubscript𝑅𝑗superscriptsubscriptproduct1𝑑subscript𝑓subscript𝐳𝑖subscript𝐳𝑗\widehat{g}_{\omega}=\sum_{i,j=1}^{m}R_{i}^{\top}R_{j}\prod_{\ell=1}^{d}f_{% \ell}(\mathbf{z}_{i\ell},\mathbf{z}_{j\ell}).over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i roman_ℓ end_POSTSUBSCRIPT , bold_z start_POSTSUBSCRIPT italic_j roman_ℓ end_POSTSUBSCRIPT ) . (35)

Using lemma 6, for any x,y𝕋𝑥𝑦𝕋x,y\in\mathbb{T}italic_x , italic_y ∈ blackboard_T,

e2𝐬I|ω|(2𝐬cosπ(xy))=nq,neπin(xy)superscript𝑒2subscript𝐬subscript𝐼subscript𝜔2subscript𝐬𝜋𝑥𝑦subscript𝑛subscript𝑞𝑛superscript𝑒𝜋i𝑛𝑥𝑦e^{-2\mathbf{s}_{\ell}}I_{|\omega_{\ell}|}(2\mathbf{s}_{\ell}\cos\pi(x-y))=% \sum_{n\in\mathbb{Z}}q_{\ell,n}e^{\pi\mathrm{i}n(x-y)}italic_e start_POSTSUPERSCRIPT - 2 bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT | italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | end_POSTSUBSCRIPT ( 2 bold_s start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT roman_cos italic_π ( italic_x - italic_y ) ) = ∑ start_POSTSUBSCRIPT italic_n ∈ blackboard_Z end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT roman_ℓ , italic_n end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_π roman_i italic_n ( italic_x - italic_y ) end_POSTSUPERSCRIPT

(q,nsubscript𝑞𝑛q_{\ell,n}italic_q start_POSTSUBSCRIPT roman_ℓ , italic_n end_POSTSUBSCRIPT depends on ωsubscript𝜔\omega_{\ell}italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT) so that, fsubscript𝑓f_{\ell}italic_f start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT defined in eq. 34 now writes

f(x,y)subscript𝑓𝑥𝑦\displaystyle f_{\ell}(x,y)italic_f start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x , italic_y ) =nq,neπin(xy)eπiω(x+y)absentsubscript𝑛subscript𝑞𝑛superscript𝑒𝜋i𝑛𝑥𝑦superscript𝑒𝜋isubscript𝜔𝑥𝑦\displaystyle=\sum_{n\in\mathbb{Z}}q_{\ell,n}e^{\pi\mathrm{i}n(x-y)}e^{-\pi% \mathrm{i}\omega_{\ell}(x+y)}= ∑ start_POSTSUBSCRIPT italic_n ∈ blackboard_Z end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT roman_ℓ , italic_n end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_π roman_i italic_n ( italic_x - italic_y ) end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_π roman_i italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_x + italic_y ) end_POSTSUPERSCRIPT
=nq,neπi(nω)xeπi(n+ω)yabsentsubscript𝑛subscript𝑞𝑛superscript𝑒𝜋i𝑛subscript𝜔𝑥superscript𝑒𝜋i𝑛subscript𝜔𝑦\displaystyle=\sum_{n\in\mathbb{Z}}q_{\ell,n}e^{\pi\mathrm{i}(n-\omega_{\ell})% x}e^{-\pi\mathrm{i}(n+\omega_{\ell})y}= ∑ start_POSTSUBSCRIPT italic_n ∈ blackboard_Z end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT roman_ℓ , italic_n end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_π roman_i ( italic_n - italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) italic_x end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_π roman_i ( italic_n + italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) italic_y end_POSTSUPERSCRIPT
=ϕ,(x)ϕ,+(y)absentsubscriptitalic-ϕ𝑥subscriptitalic-ϕ𝑦\displaystyle=\phi_{\ell,-}(x)\cdot\phi_{\ell,+}(y)= italic_ϕ start_POSTSUBSCRIPT roman_ℓ , - end_POSTSUBSCRIPT ( italic_x ) ⋅ italic_ϕ start_POSTSUBSCRIPT roman_ℓ , + end_POSTSUBSCRIPT ( italic_y ) (36)

where, for any {1,,d}1𝑑\ell\in\left\{1,\dots,d\right\}roman_ℓ ∈ { 1 , … , italic_d } and z𝕋𝑧𝕋z\in\mathbb{T}italic_z ∈ blackboard_T, we defined

ϕ,±(z)=(q,neπi(n±ω)z)n.subscriptitalic-ϕplus-or-minus𝑧subscriptsubscript𝑞𝑛superscript𝑒𝜋iplus-or-minus𝑛subscript𝜔𝑧𝑛\phi_{\ell,\pm}(z)=\left(\sqrt{q_{\ell,n}}e^{\pi\mathrm{i}(n\pm\omega_{\ell})z% }\right)_{n\in\mathbb{Z}}.italic_ϕ start_POSTSUBSCRIPT roman_ℓ , ± end_POSTSUBSCRIPT ( italic_z ) = ( square-root start_ARG italic_q start_POSTSUBSCRIPT roman_ℓ , italic_n end_POSTSUBSCRIPT end_ARG italic_e start_POSTSUPERSCRIPT italic_π roman_i ( italic_n ± italic_ω start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) italic_z end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_n ∈ blackboard_Z end_POSTSUBSCRIPT . (37)

We then define the embedding ϕ±:𝕋(d):subscriptitalic-ϕplus-or-minus𝕋superscript𝑑\phi_{\pm}:\mathbb{T}\to(\mathbb{Z}^{d}\to\mathbb{C})italic_ϕ start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT : blackboard_T → ( blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_C ) be the tensor product of the ϕ,±subscriptitalic-ϕplus-or-minus\phi_{\ell,\pm}italic_ϕ start_POSTSUBSCRIPT roman_ℓ , ± end_POSTSUBSCRIPT. Then, eq. 36, enables to write g^ωsubscript^𝑔𝜔\widehat{g}_{\omega}over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT in eq. 35 as

g^ωsubscript^𝑔𝜔\displaystyle\widehat{g}_{\omega}over^ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT =i,j=1mk=1rRkiRkjϕ(𝐳i)ϕ+(𝐳j)absentsuperscriptsubscript𝑖𝑗1𝑚superscriptsubscript𝑘1𝑟subscript𝑅𝑘𝑖subscript𝑅𝑘𝑗subscriptitalic-ϕsubscript𝐳𝑖subscriptitalic-ϕsubscript𝐳𝑗\displaystyle=\sum_{i,j=1}^{m}\sum_{k=1}^{r}R_{ki}R_{kj}\phi_{-}(\mathbf{z}_{i% })\cdot\phi_{+}(\mathbf{z}_{j})= ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_k italic_j end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_ϕ start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )
=k=1r[i=1mRkiϕ(𝐳i)][i=1mRkiϕ+(𝐳i)]absentsuperscriptsubscript𝑘1𝑟delimited-[]superscriptsubscript𝑖1𝑚subscript𝑅𝑘𝑖subscriptitalic-ϕsubscript𝐳𝑖delimited-[]superscriptsubscript𝑖1𝑚subscript𝑅𝑘𝑖subscriptitalic-ϕsubscript𝐳𝑖\displaystyle=\sum_{k=1}^{r}\left[\sum_{i=1}^{m}R_{ki}\phi_{-}(\mathbf{z}_{i})% \right]\cdot\left[\sum_{i=1}^{m}R_{ki}\phi_{+}(\mathbf{z}_{i})\right]= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT - end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] ⋅ [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ]
=k=1r[i=1mRkiϕ1,(𝐳i1)ϕd,(𝐳id)][i=1mRkiϕ1,+(𝐳i1)ϕd,+(𝐳id)]absentsuperscriptsubscript𝑘1𝑟delimited-[]superscriptsubscript𝑖1𝑚tensor-productsubscript𝑅𝑘𝑖subscriptitalic-ϕ1subscript𝐳𝑖1subscriptitalic-ϕ𝑑subscript𝐳𝑖𝑑delimited-[]superscriptsubscript𝑖1𝑚tensor-productsubscript𝑅𝑘𝑖subscriptitalic-ϕ1subscript𝐳𝑖1subscriptitalic-ϕ𝑑subscript𝐳𝑖𝑑\displaystyle=\sum_{k=1}^{r}\left[\sum_{i=1}^{m}R_{ki}\phi_{1,-}(\mathbf{z}_{i% 1})\otimes\dots\otimes\phi_{d,-}(\mathbf{z}_{id})\right]\cdot\left[\sum_{i=1}^% {m}R_{ki}\phi_{1,+}(\mathbf{z}_{i1})\otimes\dots\otimes\phi_{d,+}(\mathbf{z}_{% id})\right]= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 , - end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT ) ⊗ ⋯ ⊗ italic_ϕ start_POSTSUBSCRIPT italic_d , - end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT ) ] ⋅ [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 , + end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT ) ⊗ ⋯ ⊗ italic_ϕ start_POSTSUBSCRIPT italic_d , + end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i italic_d end_POSTSUBSCRIPT ) ]
=k=1rnd(i=1mRki=1dϕ,(𝐳i)n)(i=1mRki=1dϕ,+(𝐳i))absentsuperscriptsubscript𝑘1𝑟subscript𝑛superscript𝑑superscriptsubscript𝑖1𝑚subscript𝑅𝑘𝑖superscriptsubscriptproduct1𝑑subscriptitalic-ϕsubscriptsubscript𝐳𝑖subscript𝑛superscriptsubscript𝑖1𝑚subscript𝑅𝑘𝑖superscriptsubscriptproduct1𝑑subscriptitalic-ϕsubscript𝐳𝑖\displaystyle=\sum_{k=1}^{r}\sum_{n\in\mathbb{Z}^{d}}\left(\sum_{i=1}^{m}R_{ki% }\prod_{\ell=1}^{d}\phi_{\ell,-}(\mathbf{z}_{i\ell})_{n_{\ell}}\right)\cdot% \left(\sum_{i=1}^{m}R_{ki}\prod_{\ell=1}^{d}\phi_{\ell,+}(\mathbf{z}_{i\ell})\right)= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT roman_ℓ , - end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i roman_ℓ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ⋅ ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT roman_ℓ , + end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i roman_ℓ end_POSTSUBSCRIPT ) )

which is the desired result. ∎

Appendix E Other computation

Lemma 5.

Let f𝑓fitalic_f be the function defined on (1,1)11(-1,1)( - 1 , 1 ) with

u(0,1/2),f(cos2πu)=escos2π(uv).formulae-sequencefor-all𝑢012𝑓2𝜋𝑢superscript𝑒𝑠2𝜋𝑢𝑣\forall u\in(0,1/2),~{}~{}f(\cos 2\pi u)=e^{s\cos 2\pi(u-v)}.∀ italic_u ∈ ( 0 , 1 / 2 ) , italic_f ( roman_cos 2 italic_π italic_u ) = italic_e start_POSTSUPERSCRIPT italic_s roman_cos 2 italic_π ( italic_u - italic_v ) end_POSTSUPERSCRIPT . (38)

Then, its Chebychev coefficient are given with

fω=(1+𝟏ω0)cos(2πωv)Iω(s).subscript𝑓𝜔1subscript1𝜔02𝜋𝜔𝑣subscript𝐼𝜔𝑠f_{\omega}=(1+\mathbf{1}_{\omega\neq 0})\cos(2\pi\omega v)I_{\omega}(s).italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = ( 1 + bold_1 start_POSTSUBSCRIPT italic_ω ≠ 0 end_POSTSUBSCRIPT ) roman_cos ( 2 italic_π italic_ω italic_v ) italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_s ) . (39)
Proof.

The ω*𝜔subscript\omega\in\mathbb{N}_{*}italic_ω ∈ blackboard_N start_POSTSUBSCRIPT * end_POSTSUBSCRIPT. The component ω𝜔\omegaitalic_ω of a function f𝑓fitalic_f on the Chebychev basis is given with

fω=2π11f(x)Tω(x)dx1x2,subscript𝑓𝜔2𝜋superscriptsubscript11𝑓𝑥subscript𝑇𝜔𝑥d𝑥1superscript𝑥2f_{\omega}=\frac{2}{\pi}\int_{-1}^{1}f(x)T_{\omega}(x)\frac{\mathrm{d}x}{\sqrt% {1-x^{2}}},italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = divide start_ARG 2 end_ARG start_ARG italic_π end_ARG ∫ start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_f ( italic_x ) italic_T start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_x ) divide start_ARG roman_d italic_x end_ARG start_ARG square-root start_ARG 1 - italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG ,

which we conveniently rewrite, with the classical change of variable x=cos2πu𝑥2𝜋𝑢x=\cos 2\pi uitalic_x = roman_cos 2 italic_π italic_u,

fω=2I1f(cos2πu)cos(2πωu)dusubscript𝑓𝜔2subscriptsubscript𝐼1𝑓2𝜋𝑢2𝜋𝜔𝑢differential-d𝑢f_{\omega}=2\int_{I_{1}}f(\cos 2\pi u)\cos(2\pi\omega u)\mathrm{d}uitalic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = 2 ∫ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_f ( roman_cos 2 italic_π italic_u ) roman_cos ( 2 italic_π italic_ω italic_u ) roman_d italic_u (40)

which is valid for any interval I1subscript𝐼1I_{1}\subset\mathbb{R}italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊂ blackboard_R of length 1111.

Now, for s>0𝑠0s>0italic_s > 0, consider the function f𝑓fitalic_f defined on (1,1)11(-1,1)( - 1 , 1 ) with xescos(arccos(x)2πv)maps-to𝑥superscript𝑒𝑠𝑥2𝜋𝑣x\mapsto e^{s\cos(\arccos(x)-2\pi v)}italic_x ↦ italic_e start_POSTSUPERSCRIPT italic_s roman_cos ( roman_arccos ( italic_x ) - 2 italic_π italic_v ) end_POSTSUPERSCRIPT, or equivalently

u(0,1/2),f(cos2πu)=escos2π(uv).formulae-sequencefor-all𝑢012𝑓2𝜋𝑢superscript𝑒𝑠2𝜋𝑢𝑣\forall u\in(0,1/2),~{}~{}f(\cos 2\pi u)=e^{s\cos 2\pi(u-v)}.∀ italic_u ∈ ( 0 , 1 / 2 ) , italic_f ( roman_cos 2 italic_π italic_u ) = italic_e start_POSTSUPERSCRIPT italic_s roman_cos 2 italic_π ( italic_u - italic_v ) end_POSTSUPERSCRIPT . (41)

Putting eq. 41 into eq. 40, we obtain

fωsubscript𝑓𝜔\displaystyle f_{\omega}italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT =2I1escos2π(uv)cos(2πωu)duabsent2subscriptsubscript𝐼1superscript𝑒𝑠2𝜋𝑢𝑣2𝜋𝜔𝑢differential-d𝑢\displaystyle=2\int_{I_{1}}e^{s\cos 2\pi(u-v)}\cos(2\pi\omega u)\mathrm{d}u= 2 ∫ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_s roman_cos 2 italic_π ( italic_u - italic_v ) end_POSTSUPERSCRIPT roman_cos ( 2 italic_π italic_ω italic_u ) roman_d italic_u
=2I1escos2πucos(2πω(u+v))duabsent2subscriptsubscript𝐼1superscript𝑒𝑠2𝜋𝑢2𝜋𝜔𝑢𝑣differential-d𝑢\displaystyle=2\int_{I_{1}}e^{s\cos 2\pi u}\cos(2\pi\omega(u+v))\mathrm{d}u= 2 ∫ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_s roman_cos 2 italic_π italic_u end_POSTSUPERSCRIPT roman_cos ( 2 italic_π italic_ω ( italic_u + italic_v ) ) roman_d italic_u
=2I1escos2πucos(2πωu)cos(2πωv)du2I1escos2πusin(2πωu)sin(2πωv)du.absent2subscriptsubscript𝐼1superscript𝑒𝑠2𝜋𝑢2𝜋𝜔𝑢2𝜋𝜔𝑣differential-d𝑢2subscriptsubscript𝐼1superscript𝑒𝑠2𝜋𝑢2𝜋𝜔𝑢2𝜋𝜔𝑣differential-d𝑢\displaystyle=2\int_{I_{1}}e^{s\cos 2\pi u}\cos(2\pi\omega u)\cos(2\pi\omega v% )\mathrm{d}u-2\int_{I_{1}}e^{s\cos 2\pi u}\sin(2\pi\omega u)\sin(2\pi\omega v)% \mathrm{d}u.= 2 ∫ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_s roman_cos 2 italic_π italic_u end_POSTSUPERSCRIPT roman_cos ( 2 italic_π italic_ω italic_u ) roman_cos ( 2 italic_π italic_ω italic_v ) roman_d italic_u - 2 ∫ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_s roman_cos 2 italic_π italic_u end_POSTSUPERSCRIPT roman_sin ( 2 italic_π italic_ω italic_u ) roman_sin ( 2 italic_π italic_ω italic_v ) roman_d italic_u .

The last term is odd, hence integrate to 00 on an interval centered around 00. Hence,

fω=2cos(2πωv)I1escos2πucos(2πωu)du.subscript𝑓𝜔22𝜋𝜔𝑣subscriptsubscript𝐼1superscript𝑒𝑠2𝜋𝑢2𝜋𝜔𝑢differential-d𝑢f_{\omega}=2\cos(2\pi\omega v)\int_{I_{1}}e^{s\cos 2\pi u}\cos(2\pi\omega u)% \mathrm{d}u.italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = 2 roman_cos ( 2 italic_π italic_ω italic_v ) ∫ start_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_s roman_cos 2 italic_π italic_u end_POSTSUPERSCRIPT roman_cos ( 2 italic_π italic_ω italic_u ) roman_d italic_u . (42)

We recognize the definition of the modified Bessel function of the first kind, defined in eq. 14. Plugging this into eq. 42, we obtain

fω=2cos(2πωv)Iω(s)=2Iω(s)Hω(cos(2πv)).subscript𝑓𝜔22𝜋𝜔𝑣subscript𝐼𝜔𝑠2subscript𝐼𝜔𝑠subscript𝐻𝜔2𝜋𝑣f_{\omega}=2\cos(2\pi\omega v)I_{\omega}(s)=2I_{\omega}(s)H_{\omega}(\cos(2\pi v% )).italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = 2 roman_cos ( 2 italic_π italic_ω italic_v ) italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_s ) = 2 italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_s ) italic_H start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( roman_cos ( 2 italic_π italic_v ) ) . (43)

If ω=0𝜔0\omega=0italic_ω = 0, we add a factor 1/2121/21 / 2 into the definition in eq. 40, which yields

fω=I0(s).subscript𝑓𝜔subscript𝐼0𝑠f_{\omega}=I_{0}(s).italic_f start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = italic_I start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_s ) . (44)

Lemma 6 (Fourier decomposition of Bessel composed with cosine).

Let s>0𝑠0s>0italic_s > 0, ω𝜔\omega\in\mathbb{N}italic_ω ∈ blackboard_N and z𝕋𝑧𝕋z\in\mathbb{T}italic_z ∈ blackboard_T. Then,

e2sIω(2scos2πz)superscript𝑒2𝑠subscript𝐼𝜔2𝑠2𝜋𝑧\displaystyle e^{-2s}I_{\omega}(2s\cos 2\pi z)italic_e start_POSTSUPERSCRIPT - 2 italic_s end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( 2 italic_s roman_cos 2 italic_π italic_z ) =nqω,ne2πinz,absentsubscript𝑛subscript𝑞𝜔𝑛superscript𝑒2𝜋i𝑛𝑧\displaystyle=\sum_{n\in\mathbb{Z}}q_{\omega,n}e^{2\pi\mathrm{i}nz},= ∑ start_POSTSUBSCRIPT italic_n ∈ blackboard_Z end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_ω , italic_n end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT 2 italic_π roman_i italic_n italic_z end_POSTSUPERSCRIPT , (45)
𝑤ℎ𝑒𝑟𝑒n0,qω,n𝑤ℎ𝑒𝑟𝑒for-all𝑛0subscript𝑞𝜔𝑛\displaystyle\text{where}~{}~{}\forall n\geq 0,q_{\omega,n}where ∀ italic_n ≥ 0 , italic_q start_POSTSUBSCRIPT italic_ω , italic_n end_POSTSUBSCRIPT ={e2sp(nω2)+(s/2)2p+ωp!(p+ω)!(2p+ωpnω2)𝑖𝑓nω,0otherwise.absentcasessuperscript𝑒2𝑠subscript𝑝subscript𝑛𝜔2superscript𝑠22𝑝𝜔𝑝𝑝𝜔binomial2𝑝𝜔𝑝𝑛𝜔2𝑖𝑓𝑛𝜔0otherwise.\displaystyle=\begin{cases}e^{-2s}\sum_{p\geq(\frac{n-\omega}{2})_{+}}\frac{(s% /2)^{2p+\omega}}{p!(p+\omega)!}\binom{2p+\omega}{p-\frac{n-\omega}{2}}&\text{% if}~{}~{}n\equiv\omega,\\ 0&\text{otherwise.}\end{cases}= { start_ROW start_CELL italic_e start_POSTSUPERSCRIPT - 2 italic_s end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_p ≥ ( divide start_ARG italic_n - italic_ω end_ARG start_ARG 2 end_ARG ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ( italic_s / 2 ) start_POSTSUPERSCRIPT 2 italic_p + italic_ω end_POSTSUPERSCRIPT end_ARG start_ARG italic_p ! ( italic_p + italic_ω ) ! end_ARG ( FRACOP start_ARG 2 italic_p + italic_ω end_ARG start_ARG italic_p - divide start_ARG italic_n - italic_ω end_ARG start_ARG 2 end_ARG end_ARG ) end_CELL start_CELL if italic_n ≡ italic_ω , end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise. end_CELL end_ROW

and qω,n=qω,nsubscript𝑞𝜔𝑛subscript𝑞𝜔𝑛q_{\omega,-n}=q_{\omega,n}italic_q start_POSTSUBSCRIPT italic_ω , - italic_n end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT italic_ω , italic_n end_POSTSUBSCRIPT by evenness of the coefficients.

Proof.

From the definition of the modified Bessel function of the first kind [Watson, 1922, p.77, Eq. 2], we have

Iω(z)=p0(z/2)2p+ωp!(p+ω)!,subscript𝐼𝜔𝑧subscript𝑝0superscript𝑧22𝑝𝜔𝑝𝑝𝜔I_{\omega}(z)=\sum_{p\geq 0}\frac{(z/2)^{2p+\omega}}{p!(p+\omega)!},italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_z ) = ∑ start_POSTSUBSCRIPT italic_p ≥ 0 end_POSTSUBSCRIPT divide start_ARG ( italic_z / 2 ) start_POSTSUPERSCRIPT 2 italic_p + italic_ω end_POSTSUPERSCRIPT end_ARG start_ARG italic_p ! ( italic_p + italic_ω ) ! end_ARG ,

so that

Iω(2scos2πz)subscript𝐼𝜔2𝑠2𝜋𝑧\displaystyle I_{\omega}(2s\cos 2\pi z)italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( 2 italic_s roman_cos 2 italic_π italic_z ) =p0s2p+ωp!(p+ω)!cos(2πz)2p+ω\displaystyle=\sum_{p\geq 0}\frac{s^{2p+\omega}}{p!(p+\omega)!}\cos(2\pi z)^{2% p+\omega}= ∑ start_POSTSUBSCRIPT italic_p ≥ 0 end_POSTSUBSCRIPT divide start_ARG italic_s start_POSTSUPERSCRIPT 2 italic_p + italic_ω end_POSTSUPERSCRIPT end_ARG start_ARG italic_p ! ( italic_p + italic_ω ) ! end_ARG roman_cos ( 2 italic_π italic_z ) start_POSTSUPERSCRIPT 2 italic_p + italic_ω end_POSTSUPERSCRIPT
=p0(s/2)2p+ωp!(p+ω)!(e2πiz+e2πiz)2p+ωabsentsubscript𝑝0superscript𝑠22𝑝𝜔𝑝𝑝𝜔superscriptsuperscript𝑒2𝜋i𝑧superscript𝑒2𝜋i𝑧2𝑝𝜔\displaystyle=\sum_{p\geq 0}\frac{(s/2)^{2p+\omega}}{p!(p+\omega)!}\left(e^{2% \pi\mathrm{i}z}+e^{-2\pi\mathrm{i}z}\right)^{2p+\omega}= ∑ start_POSTSUBSCRIPT italic_p ≥ 0 end_POSTSUBSCRIPT divide start_ARG ( italic_s / 2 ) start_POSTSUPERSCRIPT 2 italic_p + italic_ω end_POSTSUPERSCRIPT end_ARG start_ARG italic_p ! ( italic_p + italic_ω ) ! end_ARG ( italic_e start_POSTSUPERSCRIPT 2 italic_π roman_i italic_z end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - 2 italic_π roman_i italic_z end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 italic_p + italic_ω end_POSTSUPERSCRIPT
=p0(s/2)2p+ωp!(p+ω)!k=02p+ω(2p+ωk)e2πi(2(pk)+ω)z.absentsubscript𝑝0superscript𝑠22𝑝𝜔𝑝𝑝𝜔superscriptsubscript𝑘02𝑝𝜔binomial2𝑝𝜔𝑘superscript𝑒2𝜋i2𝑝𝑘𝜔𝑧\displaystyle=\sum_{p\geq 0}\frac{(s/2)^{2p+\omega}}{p!(p+\omega)!}\sum_{k=0}^% {2p+\omega}\binom{2p+\omega}{k}e^{2\pi\mathrm{i}(2(p-k)+\omega)z}.= ∑ start_POSTSUBSCRIPT italic_p ≥ 0 end_POSTSUBSCRIPT divide start_ARG ( italic_s / 2 ) start_POSTSUPERSCRIPT 2 italic_p + italic_ω end_POSTSUPERSCRIPT end_ARG start_ARG italic_p ! ( italic_p + italic_ω ) ! end_ARG ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_p + italic_ω end_POSTSUPERSCRIPT ( FRACOP start_ARG 2 italic_p + italic_ω end_ARG start_ARG italic_k end_ARG ) italic_e start_POSTSUPERSCRIPT 2 italic_π roman_i ( 2 ( italic_p - italic_k ) + italic_ω ) italic_z end_POSTSUPERSCRIPT . (46)

Using the change of variable n=2(pk)+ω𝑛2𝑝𝑘𝜔n=2(p-k)+\omegaitalic_n = 2 ( italic_p - italic_k ) + italic_ω into eq. 46, we see that n𝑛nitalic_n has the same parity as ω𝜔\omegaitalic_ω and

Iω(2scos2πz)=p0(s/2)2p+ωp!(p+ω)!n=(2pω)nω2p+ω(2p+ωpnω2)e2πinz.subscript𝐼𝜔2𝑠2𝜋𝑧subscript𝑝0superscript𝑠22𝑝𝜔𝑝𝑝𝜔superscriptsubscript𝑛2𝑝𝜔𝑛𝜔2𝑝𝜔binomial2𝑝𝜔𝑝𝑛𝜔2superscript𝑒2𝜋i𝑛𝑧I_{\omega}(2s\cos 2\pi z)=\sum_{p\geq 0}\frac{(s/2)^{2p+\omega}}{p!(p+\omega)!% }\sum_{\begin{subarray}{c}n=-(2p-\omega)\\ n\equiv\omega\end{subarray}}^{2p+\omega}\binom{2p+\omega}{p-\frac{n-\omega}{2}% }e^{2\pi\mathrm{i}nz}.italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( 2 italic_s roman_cos 2 italic_π italic_z ) = ∑ start_POSTSUBSCRIPT italic_p ≥ 0 end_POSTSUBSCRIPT divide start_ARG ( italic_s / 2 ) start_POSTSUPERSCRIPT 2 italic_p + italic_ω end_POSTSUPERSCRIPT end_ARG start_ARG italic_p ! ( italic_p + italic_ω ) ! end_ARG ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_n = - ( 2 italic_p - italic_ω ) end_CELL end_ROW start_ROW start_CELL italic_n ≡ italic_ω end_CELL end_ROW end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_p + italic_ω end_POSTSUPERSCRIPT ( FRACOP start_ARG 2 italic_p + italic_ω end_ARG start_ARG italic_p - divide start_ARG italic_n - italic_ω end_ARG start_ARG 2 end_ARG end_ARG ) italic_e start_POSTSUPERSCRIPT 2 italic_π roman_i italic_n italic_z end_POSTSUPERSCRIPT . (47)

Equation 47 can be rewritten

Iω(2scos2πz)=nnωe2πinzp0(s/2)2p+ωp!(p+ω)!(2p+ωpnω2)𝟏(2p+ω)n2p+ω,subscript𝐼𝜔2𝑠2𝜋𝑧subscript𝑛𝑛𝜔superscript𝑒2𝜋i𝑛𝑧subscript𝑝0superscript𝑠22𝑝𝜔𝑝𝑝𝜔binomial2𝑝𝜔𝑝𝑛𝜔2subscript12𝑝𝜔𝑛2𝑝𝜔I_{\omega}(2s\cos 2\pi z)=\sum_{\begin{subarray}{c}n\in\mathbb{Z}\\ n\equiv\omega\end{subarray}}e^{2\pi\mathrm{i}nz}\sum_{p\geq 0}\frac{(s/2)^{2p+% \omega}}{p!(p+\omega)!}\binom{2p+\omega}{p-\frac{n-\omega}{2}}\mathbf{1}_{-(2p% +\omega)\leq n\leq 2p+\omega},italic_I start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( 2 italic_s roman_cos 2 italic_π italic_z ) = ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_n ∈ blackboard_Z end_CELL end_ROW start_ROW start_CELL italic_n ≡ italic_ω end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT 2 italic_π roman_i italic_n italic_z end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_p ≥ 0 end_POSTSUBSCRIPT divide start_ARG ( italic_s / 2 ) start_POSTSUPERSCRIPT 2 italic_p + italic_ω end_POSTSUPERSCRIPT end_ARG start_ARG italic_p ! ( italic_p + italic_ω ) ! end_ARG ( FRACOP start_ARG 2 italic_p + italic_ω end_ARG start_ARG italic_p - divide start_ARG italic_n - italic_ω end_ARG start_ARG 2 end_ARG end_ARG ) bold_1 start_POSTSUBSCRIPT - ( 2 italic_p + italic_ω ) ≤ italic_n ≤ 2 italic_p + italic_ω end_POSTSUBSCRIPT ,

for which eq. 45 is a concise rewriting. ∎