Operator Learning of Lipschitz Operators:
An Information-Theoretic Perspective

Samuel Lanthaler California Institute of Technology [email protected]
(Date: June 26, 2024)
Abstract.

Operator learning based on neural operators has emerged as a promising paradigm for the data-driven approximation of operators, map** between infinite-dimensional Banach spaces. Despite significant empirical progress, our theoretical understanding regarding the efficiency of these approximations remains incomplete. This work addresses the parametric complexity of neural operator approximations for the general class of Lipschitz continuous operators. Motivated by recent findings on the limitations of specific architectures, termed curse of parametric complexity, we here adopt an information-theoretic perspective. Our main contribution establishes lower bounds on the metric entropy of Lipschitz operators in two approximation settings; uniform approximation over a compact set of input functions, and approximation in expectation, with input functions drawn from a probability measure. It is shown that these entropy bounds imply that, regardless of the activation function used, neural operator architectures attaining an approximation accuracy ϵitalic-ϵ\epsilonitalic_ϵ must have a size that is exponentially large in ϵ1superscriptitalic-ϵ1\epsilon^{-1}italic_ϵ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. The size of architectures is here measured by counting the number of encoded bits necessary to store the given model in computational memory. The results of this work elucidate fundamental trade-offs and limitations in operator learning, providing new insights into the limitations of operator learning.

1. Introduction

Operators map** between infinite-dimensional Banach spaces of functions are ubiquitous in the natural sciences and engineering. They often appear in connection with physical models expressed as a set of partial differential equations, where operators of interest frequently arise from associated forward and inverse problems, e.g. map** initial data to the solution at a later time, or identifying external forcing terms from (partial) knowledge of the solution.

Operator learning has emerged as a new paradigm for the data-driven approximation of such operators. Popular operator learning frameworks build on the success of neural networks, but generalize this notion to the infinite-dimensional context of operator approximation, resulting in so-called neural operators. These neural operator architectures define parametric operators, whose parameters are tuned to approximate an underlying operator of interest.

While there is a very rapidly growing body of empirical work demonstrating the great potential, and practical utility, of such data-driven approaches, many open questions remain in our understanding of the theoretical underpinnings of this field, see e.g. [30] for a recent review and references therein.

First theoretical insights into specific architectures, and their underlying approximation mechanisms, can be gained by studying universal approximation, i.e. the ability to approximate very general classes of operators. The study of universal approximation of neural operators dates back at least three decades, to early work on operator networks by Chen and Chen [12]. Due to the recent rise in the popularity of operator learning and the introduction of a number of novel state-of-the-art frameworks, this early work has been complemented by a number of papers in recent years, demonstrating similar universal approximation properties for various architectures; e.g. DeepONets [41, 36], PCA-Net [6, 34], Fourier neural operator [29] and general neural operators [31, 35], as well as multiple other architectures [54, 25, 23, 10, 11].

Universal approximation implies that there are no fundamental obstructions to operator learning with a given framework, and usually requires identification of basic approximation mechanisms that can be leveraged by a given architecture. However, to determine whether operator learning can be achieved efficiently, a refined quantitative analysis is required. In such quantitative analysis, one often distinguishes between parametric complexity, relating the required model size to the achieved accuracy, and sample efficiency, relating the number of required training samples to the achieved accuracy. The focus of the present work is on parametric complexity. For research relevant to the data complexity of operator learning, we mention, for example, [3, 4, 28, 44, 5].

A general class of operators for which efficient approximation is possible, in terms of the required number of tunable parameters, are so-called holomorphic operators. Research into the approximation of holomorphic operators goes back to the seminal work of Cohen, DeVore and Schwab [14, 15], where it was shown that this class of operators can be efficiently approximated by generalized polynomial expansions. More recently, these results have been extended to neural network and neural operator approximation in a series of works [51, 47, 52, 22, 2, 43], demonstrating that similar rates can be achieved by neural operators.

Other classes of operators for which efficient convergence rates have been derived are operator Barron spaces [27] and (operator) reproducing kernel Hilbert spaces (RKHS) [37, 46]. Alternative settings, such as parametric PDEs with low-dimensional latent structure are, for example, explored in [33, 38, 20].

Apart from these specific classes of operators, efficient approximation has also been established via a case-by-case analysis for several PDE solution operators [17, 29, 36, 34, 43, 21]. These results identify a number of individual operators of interest which can be efficiently approximated by certain operator learning frameworks. Despite this progress, a general theory encompassing all these examples has yet to emerge.

A very general class of operators of interest are Lipschitz operators. Approximation theory of relevance to such a general class of operators has been developed e.g. in [40, 21, 50, 48, 32]. All of these works aim to bound the number of tunable parameters (model size) in terms of the accuracy that can be achieved.

The present work will focus on deriving lower complexity bounds for the class of Lipschitz continuous operators 𝒢:𝒟:𝒢𝒟\mathcal{G}:\mathcal{D}\to\mathbb{R}caligraphic_G : caligraphic_D → blackboard_R, defined on an infinite-dimensional domain 𝒟𝒟\mathcal{D}caligraphic_D and taking values in \mathbb{R}blackboard_R (nonlinear Lipschitz functionals). Semantically, no distinction will be made between ‘functional’ and ‘operator’, since all lower bounds established for functionals continue to hold when considering operators with infinite-dimensional output spaces – the latter containing (infinitely many) copies of \mathbb{R}blackboard_R.

In addition to the aforementioned literature on neural operator approximation theory, the present work also takes inspiration from the information-theoretic point of view on neural network approximation theory in a finite-dimensional setting, pioneered in the works [8, 9, 49, 53, 19], as well as notions of stable approximation [18, 13]. In the present work, the underlying ideas will be applied and extended to the infinite-dimensional context of operator learning.

The main motivation for this work are two recent results, established in [34] and [50] respectively, both applicable to the general setting of Lipschitz operators. A one-paragraph summary of the results in [34] and [50] is as follows:

  1. (i)

    The first result [34] shows that certain neural operator architectures, based on ReLU activations, suffer from a curse of parametric complexity: under certain assumptions on the input functions, there exist Lipschitz continuous operators which can only be approximated to accuracy ϵitalic-ϵ\epsilonitalic_ϵ, if the number of tunable parameters is exponential in ϵ1superscriptitalic-ϵ1\epsilon^{-1}italic_ϵ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT; more precisely, the number of parameters must be at least as large as Cexp(cϵγ)𝐶𝑐superscriptitalic-ϵ𝛾C\exp(c\epsilon^{-\gamma})italic_C roman_exp ( italic_c italic_ϵ start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ) with problem-dependent constants C,c,γ>0𝐶𝑐𝛾0C,c,\gamma>0italic_C , italic_c , italic_γ > 0.

  2. (ii)

    The second result in [50] shows that, under similar assumptions on the input functions, neural operator architectures based on super-expressive activation functions can approximate general Lipschitz operators to accuracy ϵitalic-ϵ\epsilonitalic_ϵ, with algebraically bounded parameter count; the number of parameters is upper bounded by Cϵγ𝐶superscriptitalic-ϵ𝛾C\epsilon^{-\gamma}italic_C italic_ϵ start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT, for problem-dependent C,γ>0𝐶𝛾0C,\gamma>0italic_C , italic_γ > 0.

While the first result, viewed in isolation, appears to hint at fundamental limitations to the development of operator learning theory on the general class of Lipschitz operators, due to the identified “curse”, the second result shows rigorously that this curse can be circumvented with a suitable choice of activation.

The aim of the present work is to examine the apparent dichotomy between these two results in detail. To this end, we explore the curse of parametric complexity from an information-theoretic perspective. As a result, we will uncover the fundamental information-theoretic character of the curse of parametric complexity, and identify the relevant trade-offs that are possible when parametric complexity is measured by the number of (real-valued) parameters as in [34, 50].

Main contributions

This work makes the following main contributions:

  • We propose an information-theoretic perspective of operator learning, based on the relation between bit-encoding and Kolmogorov metric entropy; this provides an alternative to the prevalent analysis in the literature, which has focused on estimating the required number of real-valued parameters.

  • For the model class of Lipschitz operators, we derive lower bounds on the metric entropy in two settings: one pertaining to uniform approximation, the other to approximation in expectation.

  • These bounds imply, in either setting, that an exponentially large number of encoding bits is required to store the weights of any architecture achieving accuracy ϵitalic-ϵ\epsilonitalic_ϵ on the model class. This result holds independently of the activation function that is chosen.

  • We use topological arguments to show that even generic operators can only be approximated with exponentially increasing complexity; when applied to FNO this implies that the approximation of a generic Lipschitz operator, to accuracy ϵitalic-ϵ\epsilonitalic_ϵ, requires a number of tunable parameters exponential in ϵ1superscriptitalic-ϵ1\epsilon^{-1}italic_ϵ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

Overview

The remainder of this paper is organized as follows. In Section 2, we state the main results of this work, as they pertain to operator learning with neural operator architectures. This section contains the main conceptual contributions of this work and reviews the link between bit-encoding and Kolmogorov entropy. Several technical details are left to Sections 3 and 4; in Section 3, we derive lower bounds on the Kolmogorov metric entropy of the set of 1111-Lipschitz operators in both a sup-norm and Lpsuperscript𝐿𝑝L^{p}italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT-norm approximation setting. In particular, we show that the metric ϵitalic-ϵ\epsilonitalic_ϵ-entropy increases exponentially with ϵ1superscriptitalic-ϵ1\epsilon^{-1}italic_ϵ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, implying a general curse of parametric complexity for bit-encoded architectures. This is the first main technical contribution of this work. Approximation rates for generic operators are the subject of Section 4, where we first formulate the operator approximation problem in an abstract Banach space setting, and then use topological arguments to relate approximation rates of generic elements of a model class to the metric entropy of this class. This is the second main technical contribution of this work. Finally, Section 5 contains concluding remarks.

2. Main Results

This section contains a summary of the main results of this work, applied to the specific setting of operator learning. Several of these results are based on more general, abstract propositions which are included in subsequent Sections 3 and 4. To aid readability, we leave most technical details to these latter sections. The aim of this section is instead to explain the main ideas underlying our analysis, and their implications for operator learning. Recurring notation, to be introduced and discussed in the following, is summarized in Table 1.

Notation Meaning
𝒢:𝒟𝒳𝒴:𝒢𝒟𝒳𝒴\mathcal{G}:\mathcal{D}\subset\mathcal{X}\to\mathcal{Y}caligraphic_G : caligraphic_D ⊂ caligraphic_X → caligraphic_Y Nonlinear operator with domain 𝒟𝒟\mathcal{D}caligraphic_D
𝒳𝒳\mathcal{X}caligraphic_X, 𝒴𝒴\mathcal{Y}caligraphic_Y (Input/output) Banach spaces
𝒦𝒳𝒦𝒳\mathcal{K}\subset\mathcal{X}caligraphic_K ⊂ caligraphic_X Compact subset of inputs
𝒟𝒟\mathcal{D}caligraphic_D Operator domain, 𝒟=𝒦𝒟𝒦\mathcal{D}=\mathcal{K}caligraphic_D = caligraphic_K or 𝒟=𝒳𝒟𝒳\mathcal{D}=\mathcal{X}caligraphic_D = caligraphic_X
μ𝒫(𝒳)𝜇𝒫𝒳\mu\in\mathcal{P}(\mathcal{X})italic_μ ∈ caligraphic_P ( caligraphic_X ) Probability measure on 𝒳𝒳\mathcal{X}caligraphic_X
Lip1(𝒟)subscriptLip1𝒟\mathrm{Lip}_{1}(\mathcal{D})roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) Real-valued 1111-Lipschitz operators, 𝒢:𝒟:𝒢𝒟\mathcal{G}:\mathcal{D}\to\mathbb{R}caligraphic_G : caligraphic_D → blackboard_R
𝗩𝗩\mathsf{\bm{V}}bold_sansserif_V Banach space of operators, 𝗩=C(𝒦)𝗩𝐶𝒦\mathsf{\bm{V}}=C(\mathcal{K})bold_sansserif_V = italic_C ( caligraphic_K ) or 𝗩=Lp(μ)𝗩superscript𝐿𝑝𝜇\mathsf{\bm{V}}=L^{p}(\mu)bold_sansserif_V = italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ )
𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V Compact subset of 𝗩𝗩\mathsf{\bm{V}}bold_sansserif_V, e.g. 𝗔=Lip1(𝒟)𝗔subscriptLip1𝒟\mathsf{\bm{A}}=\mathrm{Lip}_{1}(\mathcal{D})bold_sansserif_A = roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D )
Table 1. Recurring notation and definitions for operator learning.

2.1. Operator approximation by neural operators

We begin the discussion of our main results by proposing an encoder-decoder point of view on operator learning, where the encoder and decoder are implicitly defined by a given architecture. We then define approximation errors of interest and discuss two common measures to quantify the “complexity” of a given architecture. The first counts the number of tunable, real-valued parameters in the architecture. The second goes one step further, and requires specification of a bit-encoding of all parameters, i.e. encoding by a sequence of 0’s and 1’s. To fix intuition, this bit-encoding can be loosely interpreted as the representation of the parameters on computing hardware. The complexity of a bitwise-encoded architecture is measured by the number of bits required to represent it. As will be explained, this provides a link to fundamental information-theoretic concepts such as the Kolmogorov metric entropy of our model class.

2.1.1. Approximation theoretic setting

Assume we are given input and output spaces 𝒳𝒳\mathcal{X}caligraphic_X, 𝒴𝒴\mathcal{Y}caligraphic_Y. A neural operator defines a parametrized map** Φ:𝒳×q𝒴:Φ𝒳superscript𝑞𝒴\Phi:\mathcal{X}\times\mathbb{R}^{q}\to\mathcal{Y}roman_Φ : caligraphic_X × blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT → caligraphic_Y, where θq𝜃superscript𝑞\theta\in\mathbb{R}^{q}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT are tunable parameters. Specification of θ𝜃\thetaitalic_θ defines an operator, Φ(;θ):𝒳𝒴:Φ𝜃𝒳𝒴\Phi({\,\cdot\,};\theta):\mathcal{X}\to\mathcal{Y}roman_Φ ( ⋅ ; italic_θ ) : caligraphic_X → caligraphic_Y. In practice, the training of a neural operator results in an optimized parameter choice θ𝒢subscript𝜃𝒢\theta_{\mathcal{G}}italic_θ start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT for given 𝒢:𝒳𝒴:𝒢𝒳𝒴\mathcal{G}:\mathcal{X}\to\mathcal{Y}caligraphic_G : caligraphic_X → caligraphic_Y and an approximation 𝒢Φ(;θ𝒢)𝒢Φsubscript𝜃𝒢\mathcal{G}\approx\Phi({\,\cdot\,};\theta_{\mathcal{G}})caligraphic_G ≈ roman_Φ ( ⋅ ; italic_θ start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ).

Model class Lip1(𝒟)subscriptLip1𝒟\mathrm{Lip}_{1}(\mathcal{D})roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D )

In the following, we will consider a model class of 1111-Lipschitz operators, restricting attention to the case of real-valued outputs, 𝒴=𝒴\mathcal{Y}=\mathbb{R}caligraphic_Y = blackboard_R:

Definition 2.1 (Model class Lip1subscriptLip1\mathrm{Lip}_{1}roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT).

Let (𝒟,d)𝒟𝑑(\mathcal{D},d)( caligraphic_D , italic_d ) be a metric space. We define Lip1(𝒟)subscriptLip1𝒟\mathrm{Lip}_{1}(\mathcal{D})roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) as the set consisting of all 1111-Lipschitz continuous map**s 𝒢:𝒟:𝒢𝒟\mathcal{G}:\mathcal{D}\to\mathbb{R}caligraphic_G : caligraphic_D → blackboard_R with 𝒢Lip1subscriptnorm𝒢Lip1\|\mathcal{G}\|_{\mathrm{Lip}}\leq 1∥ caligraphic_G ∥ start_POSTSUBSCRIPT roman_Lip end_POSTSUBSCRIPT ≤ 1, where we define the Lip\|{\,\cdot\,}\|_{\mathrm{Lip}}∥ ⋅ ∥ start_POSTSUBSCRIPT roman_Lip end_POSTSUBSCRIPT-norm as follows:

(2.1) {𝒢Lip=max{supu𝒟|𝒢(u)|,Lip(𝒢)},Lip(𝒢)=supuv|𝒢(u)𝒢(v)|d(u,v),\displaystyle\left\{\begin{aligned} \|\mathcal{G}\|_{\mathrm{Lip}}&=\max\Big{% \{}\textstyle\sup_{u\in\mathcal{D}}|\mathcal{G}(u)|,\mathrm{Lip}(\mathcal{G})% \Big{\}},\\ \mathrm{Lip}(\mathcal{G})&=\sup_{u\neq v}\frac{|\mathcal{G}(u)-\mathcal{G}(v)|% }{d(u,v)},\end{aligned}\right.{ start_ROW start_CELL ∥ caligraphic_G ∥ start_POSTSUBSCRIPT roman_Lip end_POSTSUBSCRIPT end_CELL start_CELL = roman_max { roman_sup start_POSTSUBSCRIPT italic_u ∈ caligraphic_D end_POSTSUBSCRIPT | caligraphic_G ( italic_u ) | , roman_Lip ( caligraphic_G ) } , end_CELL end_ROW start_ROW start_CELL roman_Lip ( caligraphic_G ) end_CELL start_CELL = roman_sup start_POSTSUBSCRIPT italic_u ≠ italic_v end_POSTSUBSCRIPT divide start_ARG | caligraphic_G ( italic_u ) - caligraphic_G ( italic_v ) | end_ARG start_ARG italic_d ( italic_u , italic_v ) end_ARG , end_CELL end_ROW

As described in the introduction, the goal of operator learning is to approximate 𝒢:𝒟:𝒢𝒟\mathcal{G}:\mathcal{D}\to\mathbb{R}caligraphic_G : caligraphic_D → blackboard_R by a neural operator Φ:𝒟×q:Φ𝒟superscript𝑞\Phi:\mathcal{D}\times\mathbb{R}^{q}\to\mathbb{R}roman_Φ : caligraphic_D × blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT → blackboard_R. In this work, we aim to relate the approximation accuracy ϵitalic-ϵ\epsilonitalic_ϵ to the required model size of ΦΦ\Phiroman_Φ. We will focus on two settings, where either (i) 𝒟=𝒦𝒳𝒟𝒦𝒳\mathcal{D}=\mathcal{K}\subset\mathcal{X}caligraphic_D = caligraphic_K ⊂ caligraphic_X is a compact subset of a Banach space and the metric is the sup-norm over 𝒦𝒦\mathcal{K}caligraphic_K, or (ii) 𝒟=𝒳𝒟𝒳\mathcal{D}=\mathcal{X}caligraphic_D = caligraphic_X is a Banach space and the metric is induced by the Lp(μ)superscript𝐿𝑝𝜇L^{p}(\mu)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ )-norm with respect to a probability measure μ𝜇\muitalic_μ on 𝒳𝒳\mathcal{X}caligraphic_X (cp. Table 2).

Approximation spaces and norms

To measure the approximation accuracy of this approximation task, we have to define a distance between operators. To this end, we will consider a Banach space of operators 𝗩𝗩\mathsf{\bm{V}}bold_sansserif_V, allowing for an embedding Lip1(𝒟)𝗩subscriptLip1𝒟𝗩\mathrm{Lip}_{1}(\mathcal{D})\subset\mathsf{\bm{V}}roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) ⊂ bold_sansserif_V. Throughout, we will consider one of the following two settings. In the first setting, we aim to approximate 𝒢𝒢\mathcal{G}caligraphic_G over a compact domain 𝒟=𝒦𝒳𝒟𝒦𝒳\mathcal{D}=\mathcal{K}\subset\mathcal{X}caligraphic_D = caligraphic_K ⊂ caligraphic_X:

Setting 2.2 (Uniform approximation).

If 𝒢:𝒦:𝒢𝒦\mathcal{G}:\mathcal{K}\to\mathbb{R}caligraphic_G : caligraphic_K → blackboard_R is an operator with compact domain 𝒦𝒳𝒦𝒳\mathcal{K}\subset\mathcal{X}caligraphic_K ⊂ caligraphic_X, we will study its uniform approximation over 𝒦𝒦\mathcal{K}caligraphic_K, i.e. we take 𝗩=C(𝒦)𝗩𝐶𝒦\mathsf{\bm{V}}=C(\mathcal{K})bold_sansserif_V = italic_C ( caligraphic_K ) to be the space of continuous operators, metrized by the sup-norm:

(2.2) 𝒢C(𝒦)=supu𝒦|𝒢(u)|.subscriptnorm𝒢𝐶𝒦subscriptsupremum𝑢𝒦𝒢𝑢\displaystyle\|\mathcal{G}\|_{C(\mathcal{K})}=\sup_{u\in\mathcal{K}}|\mathcal{% G}(u)|.∥ caligraphic_G ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT italic_u ∈ caligraphic_K end_POSTSUBSCRIPT | caligraphic_G ( italic_u ) | .

A common special case of this setting is the case where 𝒦𝒳𝒦𝒳\mathcal{K}\subset\mathcal{X}caligraphic_K ⊂ caligraphic_X is defined by a smoothness constraint, as illustrated by the following example:

Example 2.3.

Let Dd𝐷superscript𝑑D\subset\mathbb{R}^{d}italic_D ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be a bounded domain. An example of the setting above is the case of Lipschitz operators 𝒢:𝒦L2(D):𝒢𝒦superscript𝐿2𝐷\mathcal{G}:\mathcal{K}\subset L^{2}(D)\to\mathbb{R}caligraphic_G : caligraphic_K ⊂ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ) → blackboard_R, with

𝒦={uHs(D)|uHs(D)C},𝒦conditional-set𝑢superscript𝐻𝑠𝐷subscriptnorm𝑢superscript𝐻𝑠𝐷𝐶\mathcal{K}={\left\{u\in H^{s}(D)\,\middle|\,\|u\|_{H^{s}(D)}\leq C\right\}},caligraphic_K = { italic_u ∈ italic_H start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( italic_D ) | ∥ italic_u ∥ start_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( italic_D ) end_POSTSUBSCRIPT ≤ italic_C } ,

a set defined by a Sobolev smoothness constraint for s>0𝑠0s>0italic_s > 0. Here, 𝒳=L2(D)𝒳superscript𝐿2𝐷\mathcal{X}=L^{2}(D)caligraphic_X = italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ).

In the second setting, we aim to approximate 𝒢𝒢\mathcal{G}caligraphic_G over the entire Banach space 𝒟=𝒳𝒟𝒳\mathcal{D}=\mathcal{X}caligraphic_D = caligraphic_X, but with respect to a (Bochner) Lp(μ)superscript𝐿𝑝𝜇L^{p}(\mu)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ )-norm:

Setting 2.4 (Approximation in expectation).

If 𝒢:𝒳:𝒢𝒳\mathcal{G}:\mathcal{X}\to\mathbb{R}caligraphic_G : caligraphic_X → blackboard_R is an operator with unbounded domain 𝒳𝒳\mathcal{X}caligraphic_X a separable Banach space, then we will assume that inputs are drawn at random from a probability measure μ𝒫(𝒳)𝜇𝒫𝒳\mu\in\mathcal{P}(\mathcal{X})italic_μ ∈ caligraphic_P ( caligraphic_X ). In this case, we fix p[1,)𝑝1p\in[1,\infty)italic_p ∈ [ 1 , ∞ ) and take 𝗩=Lp(μ)𝗩superscript𝐿𝑝𝜇\mathsf{\bm{V}}=L^{p}(\mu)bold_sansserif_V = italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) as the space of μ𝜇\muitalic_μ-measurable operators with finite p𝑝pitalic_p-th norm. Lp(μ)superscript𝐿𝑝𝜇L^{p}(\mu)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) is metrized by the Bochner Lpsuperscript𝐿𝑝L^{p}italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT-norm,

(2.3) 𝒢Lp(μ)=𝔼uμ[|𝒢(u)|p]1/p.subscriptnorm𝒢superscript𝐿𝑝𝜇subscript𝔼similar-to𝑢𝜇superscriptdelimited-[]superscript𝒢𝑢𝑝1𝑝\displaystyle\|\mathcal{G}\|_{L^{p}(\mu)}=\mathbb{E}_{u\sim\mu}\left[|\mathcal% {G}(u)|^{p}\right]^{1/p}.∥ caligraphic_G ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT italic_u ∼ italic_μ end_POSTSUBSCRIPT [ | caligraphic_G ( italic_u ) | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT 1 / italic_p end_POSTSUPERSCRIPT .
Operator domain Operator class Approximation space Norm
𝒦𝒳𝒦𝒳\mathcal{K}\subset\mathcal{X}caligraphic_K ⊂ caligraphic_X compact 𝒢Lip1(𝒦)𝒢subscriptLip1𝒦\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{K})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) 𝗩=C(𝒦)𝗩𝐶𝒦\mathsf{\bm{V}}=C(\mathcal{K})bold_sansserif_V = italic_C ( caligraphic_K ) sup-norm
𝒳𝒳\mathcal{X}caligraphic_X Banach 𝒢Lip1(𝒳)𝒢subscriptLip1𝒳\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{X})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) 𝗩=Lp(μ)𝗩superscript𝐿𝑝𝜇\mathsf{\bm{V}}=L^{p}(\mu)bold_sansserif_V = italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) Lp(μ)superscript𝐿𝑝𝜇L^{p}(\mu)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ )-norm
Table 2. Operator approximation settings
Measures of complexity: Counting parameters versus bits

We will distinguish two ways of measuring the “complexity” of neural operator Φ(;θ)Φ𝜃\Phi({\,\cdot\,};\theta)roman_Φ ( ⋅ ; italic_θ ): one based on the number of tunable (real-valued) parameters, the other requiring bit-encoding (or quantization) of the parameters.

A first intuitive notion of complexity is the minimal number of tunable parameters required to reach approximation accuracy ϵitalic-ϵ\epsilonitalic_ϵ, i.e. the parameter dimension q𝑞qitalic_q of a neural operator Φ:𝒟×q𝒴:Φ𝒟superscript𝑞𝒴\Phi:\mathcal{D}\times\mathbb{R}^{q}\to\mathcal{Y}roman_Φ : caligraphic_D × blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT → caligraphic_Y. As mentioned in the introduction, this point of view has been prevalent in the development of approximation theory for operator learning. As explained previously, depending on the type of activation function that is used, vastly different conclusions can be reached with this definition of complexity. This fact is well-known in the finite-dimensional setting: For example, it has been shown [42] that there exist smooth, sigmoidal activation functions for which a neural network of fixed size can approximate arbitrary continuous function to arbitrary accuracy, i.e. approximation accuracy ϵitalic-ϵ\epsilonitalic_ϵ can be reached with a number of parameters q=O(1)𝑞𝑂1q=O(1)italic_q = italic_O ( 1 ).

In practical implementations, real-valued parameters can only be digitally represented to finite accuracy. This observation has led a number of authors [8, 9, 49, 53, 19], to analyze neural network approximation from a bit-encoding perspective. In this approach, the continuous parameters θq𝜃superscript𝑞\theta\in\mathbb{R}^{q}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT are replaced by quantized parameters θΘ𝜃Θ\theta\in\Thetaitalic_θ ∈ roman_Θ, where ΘqΘsuperscript𝑞\Theta\subset\mathbb{R}^{q}roman_Θ ⊂ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT is a finite set. If the number of elements is bounded, say |Θ|=2BΘsuperscript2𝐵|\Theta|=2^{B}| roman_Θ | = 2 start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT for some B𝐵B\in\mathbb{N}italic_B ∈ blackboard_N, then we can identify Θ{0,1}Bsimilar-to-or-equalsΘsuperscript01𝐵\Theta\simeq\{0,1\}^{B}roman_Θ ≃ { 0 , 1 } start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT, i.e. each element in the set ΘΘ\Thetaroman_Θ is encoded by a string of B𝐵Bitalic_B bits. Taking this information-theoretic point of view, it is possible to derive (lower) complexity bounds that are independent of the activation function.

2.2. Encoder-decoder view of neural operators

Given the discussion of the last paragraph, we now outline an encoder-decoder point of view on neural operators, emphasizing the difference between “counting parameters” and “counting (encoding) bits”.

Counting parameters

Let Φ:𝒟×q:Φ𝒟superscript𝑞\Phi:\mathcal{D}\times\mathbb{R}^{q}\to\mathbb{R}roman_Φ : caligraphic_D × blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT → blackboard_R be a neural operator architecture. To explain our intuition, we temporarily assume the existence of, and fix an optimal parameter choice θ𝒢qsubscript𝜃𝒢superscript𝑞\theta_{\mathcal{G}}\in\mathbb{R}^{q}italic_θ start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT for each 𝒢Lip1(𝒟)𝒢subscriptLip1𝒟\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{D})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ), so that

(2.4) θ𝒢argminθq𝒢Φ(;θ)𝗩,𝒢Lip1(𝒟),formulae-sequencesubscript𝜃𝒢subscriptargmin𝜃superscript𝑞subscriptnorm𝒢Φ𝜃𝗩for-all𝒢subscriptLip1𝒟\displaystyle\theta_{\mathcal{G}}\in\operatorname*{argmin}_{\theta\in\mathbb{R% }^{q}}\|\mathcal{G}-\Phi({\,\cdot\,};\theta)\|_{\mathsf{\bm{V}}},\quad\forall% \,\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{D}),italic_θ start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ∈ roman_argmin start_POSTSUBSCRIPT italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ caligraphic_G - roman_Φ ( ⋅ ; italic_θ ) ∥ start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT , ∀ caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) ,

with respect to the relevant norm of interest on the space of operators 𝗩Lip1(𝒟)subscriptLip1𝒟𝗩\mathsf{\bm{V}}\supset\mathrm{Lip}_{1}(\mathcal{D})bold_sansserif_V ⊃ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ). The corresponding encoder is then given by

(2.5) :Lip1(𝒟)q,𝒢θ𝒢.:formulae-sequencesubscriptLip1𝒟superscript𝑞maps-to𝒢subscript𝜃𝒢\displaystyle\mathcal{E}:\mathrm{Lip}_{1}(\mathcal{D})\to\mathbb{R}^{q},\quad% \mathcal{G}\mapsto\theta_{\mathcal{G}}.caligraphic_E : roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) → blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT , caligraphic_G ↦ italic_θ start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT .

The corresponding decoder is

(2.6) 𝒟:q𝗩,θΦ(;θ).:𝒟formulae-sequencesuperscript𝑞𝗩maps-to𝜃Φ𝜃\displaystyle\mathcal{D}:\mathbb{R}^{q}\to\mathsf{\bm{V}},\quad\theta\mapsto% \Phi({\,\cdot\,};\theta).caligraphic_D : blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT → bold_sansserif_V , italic_θ ↦ roman_Φ ( ⋅ ; italic_θ ) .

In this way, the operator learning architecture ΦΦ\Phiroman_Φ induces a natural encoder/decoder pair on the relevant space of operators, and we are interested in bounds on the encoding error, either for individual 𝒢Lip1(𝒟)𝒢subscriptLip1𝒟\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{D})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ), i.e.

(2.7) Err(𝒢;Φ)𝗩=infθq𝒢Φ(;θ)𝗩,Errsubscript𝒢Φ𝗩subscriptinfimum𝜃superscript𝑞subscriptnorm𝒢Φ𝜃𝗩\displaystyle\mathrm{Err}(\mathcal{G};\Phi)_{\mathsf{\bm{V}}}=\inf_{\theta\in% \mathbb{R}^{q}}\|\mathcal{G}-\Phi({\,\cdot\,};\theta)\|_{\mathsf{\bm{V}}},roman_Err ( caligraphic_G ; roman_Φ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT = roman_inf start_POSTSUBSCRIPT italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ caligraphic_G - roman_Φ ( ⋅ ; italic_θ ) ∥ start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT ,

or in a minimax sense, i.e.

(2.8) Err(Lip1(𝒟);Φ)𝗩=sup𝒢Lip1(𝒟)infθq𝒢Φ(;θ)𝗩.ErrsubscriptsubscriptLip1𝒟Φ𝗩subscriptsupremum𝒢subscriptLip1𝒟subscriptinfimum𝜃superscript𝑞subscriptnorm𝒢Φ𝜃𝗩\displaystyle\mathrm{Err}(\mathrm{Lip}_{1}(\mathcal{D});\Phi)_{\mathsf{\bm{V}}% }=\sup_{\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{D})}\inf_{\theta\in\mathbb{R}^% {q}}\|\mathcal{G}-\Phi({\,\cdot\,};\theta)\|_{\mathsf{\bm{V}}}.roman_Err ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) ; roman_Φ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) end_POSTSUBSCRIPT roman_inf start_POSTSUBSCRIPT italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ caligraphic_G - roman_Φ ( ⋅ ; italic_θ ) ∥ start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT .

Given a desired approximation accuracy ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, either in the sense (2.7) or (2.8), one quantity of interest is the required “complexity” of any architecture ΦΦ\Phiroman_Φ achieving this accuracy. The above point of view is consistent with estimates on the required number of parameters q𝑞qitalic_q.

Counting bits

As discussed before, the number of parameters q𝑞qitalic_q is not a suitable measure of complexity when results independent of the activation are sought. Therefore, we now assume that the parameters θq𝜃superscript𝑞\theta\in\mathbb{R}^{q}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT are encoded by B𝐵Bitalic_B bits. This defines a subset ΘqΘsuperscript𝑞\Theta\subset\mathbb{R}^{q}roman_Θ ⊂ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT consisting of |Θ|=2BΘsuperscript2𝐵|\Theta|=2^{B}| roman_Θ | = 2 start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT elements. Each θΘ𝜃Θ\theta\in\Thetaitalic_θ ∈ roman_Θ is in correspondence with its bit-encoding [θ]{0,1}Bdelimited-[]𝜃superscript01𝐵[\theta]\in\{0,1\}^{B}[ italic_θ ] ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT. Thus, upon associating with any 𝒢Lip1(𝒟)𝒢subscriptLip1𝒟\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{D})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) the optimal θ𝒢Θsubscript𝜃𝒢Θ\theta_{\mathcal{G}}\in\Thetaitalic_θ start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ∈ roman_Θ, the continuum encoder (2.5) is now replaced by a bitwise-encoder,

(2.9) 𝔈:Lip1(𝒟){0,1}B,𝒢[θ𝒢],:𝔈formulae-sequencesubscriptLip1𝒟superscript01𝐵maps-to𝒢delimited-[]subscript𝜃𝒢\displaystyle\mathfrak{E}:\mathrm{Lip}_{1}(\mathcal{D})\to\{0,1\}^{B},\quad% \mathcal{G}\mapsto[\theta_{\mathcal{G}}],fraktur_E : roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) → { 0 , 1 } start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT , caligraphic_G ↦ [ italic_θ start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ] ,

with bitwise-decoder,

(2.10) 𝔇:{0,1}B𝗩,[θ]Φ(;θ).:𝔇formulae-sequencesuperscript01𝐵𝗩maps-todelimited-[]𝜃Φ𝜃\displaystyle\mathfrak{D}:\{0,1\}^{B}\to\mathsf{\bm{V}},\quad[\theta]\mapsto% \Phi({\,\cdot\,};\theta).fraktur_D : { 0 , 1 } start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT → bold_sansserif_V , [ italic_θ ] ↦ roman_Φ ( ⋅ ; italic_θ ) .

The individual and minimax errors, (2.7) and (2.8), have the following bit-encoded counterparts,

(2.11) Err(𝒢;Φ,Θ)𝗩=infθΘ𝒢Φ(;θ)𝗩.Errsubscript𝒢ΦΘ𝗩subscriptinfimum𝜃Θsubscriptnorm𝒢Φ𝜃𝗩\displaystyle\mathrm{Err}(\mathcal{G};\Phi,\Theta)_{\mathsf{\bm{V}}}=\inf_{% \theta\in\Theta}\|\mathcal{G}-\Phi({\,\cdot\,};\theta)\|_{\mathsf{\bm{V}}}.roman_Err ( caligraphic_G ; roman_Φ , roman_Θ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT = roman_inf start_POSTSUBSCRIPT italic_θ ∈ roman_Θ end_POSTSUBSCRIPT ∥ caligraphic_G - roman_Φ ( ⋅ ; italic_θ ) ∥ start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT .

and

(2.12) Err(Lip1(𝒟);Φ,Θ)𝗩=sup𝒢Lip1(𝒟)infθΘ𝒢Φ(;θ)𝗩.ErrsubscriptsubscriptLip1𝒟ΦΘ𝗩subscriptsupremum𝒢subscriptLip1𝒟subscriptinfimum𝜃Θsubscriptnorm𝒢Φ𝜃𝗩\displaystyle\mathrm{Err}(\mathrm{Lip}_{1}(\mathcal{D});\Phi,\Theta)_{\mathsf{% \bm{V}}}=\sup_{\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{D})}\inf_{\theta\in% \Theta}\|\mathcal{G}-\Phi({\,\cdot\,};\theta)\|_{\mathsf{\bm{V}}}.roman_Err ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) ; roman_Φ , roman_Θ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) end_POSTSUBSCRIPT roman_inf start_POSTSUBSCRIPT italic_θ ∈ roman_Θ end_POSTSUBSCRIPT ∥ caligraphic_G - roman_Φ ( ⋅ ; italic_θ ) ∥ start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT .

In the present work, we will focus on such a bit-encoding point of view, but mention that there are close links between these two points of view, if the map** θΦ(;θ)maps-to𝜃Φ𝜃\theta\mapsto\Phi({\,\cdot\,};\theta)italic_θ ↦ roman_Φ ( ⋅ ; italic_θ ) possesses some stability properties. Specifically, this link will be used to derive lower complexity bounds for the Fourier neural operator in Section 2.6.

2.3. Information-theoretic notions

The relevance of the bit-encoding point of view is that it relates directly to the (Kolmogorov) metric entropy of the underlying model class 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V and allows results to be derived which are independent of specifics of the architecture such as the choice of activation function. Thus bit-encoding enables analysis relating directly to intrinsic topological properties of 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A.

Minimax code-length

Abstracting further our previous discussion, we make the following formal definition of abstract bitwise encoder/decoder pairs:

Definition 2.5 (Abstract bitwise encoder/decoder pairs).

Given a compact subset 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V of a Banach space 𝗩𝗩\mathsf{\bm{V}}bold_sansserif_V, we denote by EncB(𝗔;𝗩)subscriptEnc𝐵𝗔𝗩\mathrm{Enc}_{B}(\mathsf{\bm{A}};\mathsf{\bm{V}})roman_Enc start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( bold_sansserif_A ; bold_sansserif_V ) the set of all bitwise encoder/decoder pairs (𝔈,𝔇)𝔈𝔇(\mathfrak{E},\mathfrak{D})( fraktur_E , fraktur_D ) of length B𝐵Bitalic_B, i.e. all pairs of map**s 𝔈:𝗔{0,1}B:𝔈𝗔superscript01𝐵\mathfrak{E}:\mathsf{\bm{A}}\to\{0,1\}^{B}fraktur_E : bold_sansserif_A → { 0 , 1 } start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT and 𝔇:{0,1}B𝗩:𝔇superscript01𝐵𝗩\mathfrak{D}:\{0,1\}^{B}\to\mathsf{\bm{V}}fraktur_D : { 0 , 1 } start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT → bold_sansserif_V.

Following [9], for ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, we also introduce the minimax code length (𝗔;ϵ)𝗩subscript𝗔italic-ϵ𝗩\mathcal{L}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}caligraphic_L ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT of a compact set 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V as the minimal number of bits B𝐵Bitalic_B for which there exists an (abstract) encoder/decoder pair (𝔈,𝔇)EncB(𝗔;𝗩)𝔈𝔇subscriptEnc𝐵𝗔𝗩(\mathfrak{E},\mathfrak{D})\in\mathrm{Enc}_{B}(\mathsf{\bm{A}};\mathsf{\bm{V}})( fraktur_E , fraktur_D ) ∈ roman_Enc start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( bold_sansserif_A ; bold_sansserif_V ) such that

sup𝒢𝗔𝒢𝔇𝔈(𝒢)𝗩ϵ.subscriptsupremum𝒢𝗔subscriptnorm𝒢𝔇𝔈𝒢𝗩italic-ϵ\sup_{\mathcal{G}\in\mathsf{\bm{A}}}\|\mathcal{G}-\mathfrak{D}\circ\mathfrak{E% }(\mathcal{G})\|_{\mathsf{\bm{V}}}\leq\epsilon.roman_sup start_POSTSUBSCRIPT caligraphic_G ∈ bold_sansserif_A end_POSTSUBSCRIPT ∥ caligraphic_G - fraktur_D ∘ fraktur_E ( caligraphic_G ) ∥ start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT ≤ italic_ϵ .

That is,

(2.15) (𝗔;ϵ)𝗩:=min{B|(𝔈,𝔇)EncB(𝗔;𝗩) s.t. sup𝒢𝗔𝒢𝔇𝔈(𝒢)𝗩ϵ}.assignsubscript𝗔italic-ϵ𝗩𝐵𝔈𝔇subscriptEnc𝐵𝗔𝗩 s.t. subscriptsupremum𝒢𝗔subscriptdelimited-∥∥𝒢𝔇𝔈𝒢𝗩italic-ϵ\displaystyle\mathcal{L}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}:=\min{% \left\{B\in\mathbb{N}\,\middle|\,\begin{gathered}\exists\,(\mathfrak{E},% \mathfrak{D})\in\mathrm{Enc}_{B}(\mathsf{\bm{A}};\mathsf{\bm{V}})\text{ s.t. }% \\ \textstyle\sup_{\mathcal{G}\in\mathsf{\bm{A}}}\|\mathcal{G}-\mathfrak{D}\circ% \mathfrak{E}(\mathcal{G})\|_{\mathsf{\bm{V}}}\leq\epsilon\end{gathered}\right% \}}.caligraphic_L ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT := roman_min { italic_B ∈ blackboard_N | start_ROW start_CELL ∃ ( fraktur_E , fraktur_D ) ∈ roman_Enc start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( bold_sansserif_A ; bold_sansserif_V ) s.t. end_CELL end_ROW start_ROW start_CELL roman_sup start_POSTSUBSCRIPT caligraphic_G ∈ bold_sansserif_A end_POSTSUBSCRIPT ∥ caligraphic_G - fraktur_D ∘ fraktur_E ( caligraphic_G ) ∥ start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT ≤ italic_ϵ end_CELL end_ROW } .
Kolmogorov metric entropy

Given a metric space (𝗩,d)𝗩𝑑(\mathsf{\bm{V}},d)( bold_sansserif_V , italic_d ), element g𝗩𝑔𝗩g\in\mathsf{\bm{V}}italic_g ∈ bold_sansserif_V and r>0𝑟0r>0italic_r > 0, we denote by

Br¯(g):={f𝗩|d(g,f)ϵ},assign¯subscript𝐵𝑟𝑔conditional-set𝑓𝗩𝑑𝑔𝑓italic-ϵ\overline{B_{r}}(g):={\left\{f\in\mathsf{\bm{V}}\,\middle|\,d(g,f)\leq\epsilon% \right\}},over¯ start_ARG italic_B start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG ( italic_g ) := { italic_f ∈ bold_sansserif_V | italic_d ( italic_g , italic_f ) ≤ italic_ϵ } ,

the closed ball of radius r𝑟ritalic_r. We now make the following definition for the covering number and (Kolmogorov) metric entropy:

Definition 2.6 (Covering number and metric entropy).

Let (𝗩,d)𝗩𝑑(\mathsf{\bm{V}},d)( bold_sansserif_V , italic_d ) be a metric space. For ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, the ϵitalic-ϵ\epsilonitalic_ϵ-covering number of a set 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V, denoted 𝒩(𝗔;ϵ)𝗩𝒩subscript𝗔italic-ϵ𝗩\mathcal{N}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}caligraphic_N ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT, is the smallest integer N𝑁N\in\mathbb{N}italic_N ∈ blackboard_N, such that 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A can be covered by N𝑁Nitalic_N closed balls of radius ϵitalic-ϵ\epsilonitalic_ϵ, i.e.

(2.16) 𝒩(𝗔;ϵ)𝗩:=min{N|g1,,gN𝗩, s.t. 𝗔j=1NBϵ¯(gj)}.assign𝒩subscript𝗔italic-ϵ𝗩𝑁subscript𝑔1subscript𝑔𝑁𝗩 s.t. 𝗔superscriptsubscript𝑗1𝑁¯subscript𝐵italic-ϵsubscript𝑔𝑗\displaystyle\mathcal{N}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}:=\min{% \left\{N\in\mathbb{N}\,\middle|\,\exists\,g_{1},\dots,g_{N}\in\mathsf{\bm{V}},% \text{ s.t. }\mathsf{\bm{A}}\subset\textstyle{\bigcup_{j=1}^{N}}\overline{B_{% \epsilon}}(g_{j})\right\}}.caligraphic_N ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT := roman_min { italic_N ∈ blackboard_N | ∃ italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_g start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ bold_sansserif_V , s.t. bold_sansserif_A ⊂ ⋃ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT over¯ start_ARG italic_B start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_ARG ( italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) } .

We note that the subscript 𝗩𝗩\mathsf{\bm{V}}bold_sansserif_V is used as a shorthand for (𝗩,d)𝗩𝑑(\mathsf{\bm{V}},d)( bold_sansserif_V , italic_d ), with the relevant metric d𝑑ditalic_d implied. The metric entropy of 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V is defined as the logarithm (to base 2222) of the covering number, i.e.

(2.17) (𝗔;ϵ)𝗩=log2𝒩(𝗔;ϵ)𝗩.subscript𝗔italic-ϵ𝗩subscript2𝒩subscript𝗔italic-ϵ𝗩\displaystyle\mathcal{H}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}=\log_{2}% \mathcal{N}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}.caligraphic_H ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT = roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_N ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT .
Link between minimax code-length and metric entropy

The minimax code-length and metric entropy introduced in the previous paragraphs are linked by the following fundamental result [16, Rmk. 5.10]:

Proposition 2.7.

Let 𝗩𝗩\mathsf{\bm{V}}bold_sansserif_V be a Banach space, and let 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V be compact. Then the metric entropy of 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A provides a lower bound on the minimax code length:

(2.18) (𝗔;ϵ)𝗩(𝗔;ϵ)𝗩.subscript𝗔italic-ϵ𝗩subscript𝗔italic-ϵ𝗩\displaystyle\mathcal{L}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}\geq% \mathcal{H}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}.caligraphic_L ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT ≥ caligraphic_H ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT .
Proof.

Let ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 be given. Let (𝔈,𝔇)𝔈𝔇(\mathfrak{E},\mathfrak{D})( fraktur_E , fraktur_D ) be a bitwise encoder/decoder pair with B=(𝗔;ϵ)𝗩𝐵subscript𝗔italic-ϵ𝗩B=\mathcal{L}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}italic_B = caligraphic_L ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT bits, achieving reconstruction error at most ϵitalic-ϵ\epsilonitalic_ϵ on 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A. The image of 𝔇:{0,1}B𝗩:𝔇superscript01𝐵𝗩\mathfrak{D}:\{0,1\}^{B}\to\mathsf{\bm{V}}fraktur_D : { 0 , 1 } start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT → bold_sansserif_V contains at most N=2B𝑁superscript2𝐵N=2^{B}italic_N = 2 start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT elements, 𝒢1,,𝒢Nsubscript𝒢1subscript𝒢𝑁\mathcal{G}_{1},\dots,\mathcal{G}_{N}caligraphic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_G start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT. Since, for any 𝒢𝗔𝒢𝗔\mathcal{G}\in\mathsf{\bm{A}}caligraphic_G ∈ bold_sansserif_A, the specific choice 𝔇𝔈(𝒢)𝔇𝔈𝒢\mathfrak{D}\circ\mathfrak{E}(\mathcal{G})fraktur_D ∘ fraktur_E ( caligraphic_G ) belongs to the image of 𝔇𝔇\mathfrak{D}fraktur_D, it follows that

sup𝒢𝗔infn=1,,N𝒢𝒢nsup𝒢𝗔𝒢𝔇𝔈(𝒢)ϵ.subscriptsupremum𝒢𝗔subscriptinfimum𝑛1𝑁norm𝒢subscript𝒢𝑛subscriptsupremum𝒢𝗔norm𝒢𝔇𝔈𝒢italic-ϵ\sup_{\mathcal{G}\in\mathsf{\bm{A}}}\inf_{n=1,\dots,N}\|\mathcal{G}-\mathcal{G% }_{n}\|\leq\sup_{\mathcal{G}\in\mathsf{\bm{A}}}\|\mathcal{G}-\mathfrak{D}\circ% \mathfrak{E}(\mathcal{G})\|\leq\epsilon.roman_sup start_POSTSUBSCRIPT caligraphic_G ∈ bold_sansserif_A end_POSTSUBSCRIPT roman_inf start_POSTSUBSCRIPT italic_n = 1 , … , italic_N end_POSTSUBSCRIPT ∥ caligraphic_G - caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ ≤ roman_sup start_POSTSUBSCRIPT caligraphic_G ∈ bold_sansserif_A end_POSTSUBSCRIPT ∥ caligraphic_G - fraktur_D ∘ fraktur_E ( caligraphic_G ) ∥ ≤ italic_ϵ .

Thus, 𝗔n=1NBϵ¯(𝒢n)𝗔superscriptsubscript𝑛1𝑁¯subscript𝐵italic-ϵsubscript𝒢𝑛\mathsf{\bm{A}}\subset\bigcup_{n=1}^{N}\overline{B_{\epsilon}}(\mathcal{G}_{n})bold_sansserif_A ⊂ ⋃ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT over¯ start_ARG italic_B start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_ARG ( caligraphic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), implying that the covering number of 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A is bounded by

𝒩(𝗔;ϵ)𝗩N=2B.𝒩subscript𝗔italic-ϵ𝗩𝑁superscript2𝐵\mathcal{N}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}\leq N=2^{B}.caligraphic_N ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT ≤ italic_N = 2 start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT .

Taking logarithms and recalling that B=(𝗔;ϵ)𝗩𝐵subscript𝗔italic-ϵ𝗩B=\mathcal{L}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}italic_B = caligraphic_L ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT yields the claim. ∎

In particular, Proposition 2.7 implies that if (𝗔;ϵ)𝗩>Bsubscript𝗔italic-ϵ𝗩𝐵\mathcal{H}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}>Bcaligraphic_H ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT > italic_B, then there cannot exist a bit-encoder-decoder pair (𝔈,𝔇)EncB(𝗔;𝗩)𝔈𝔇subscriptEnc𝐵𝗔𝗩(\mathfrak{E},\mathfrak{D})\in\mathrm{Enc}_{B}(\mathsf{\bm{A}};\mathsf{\bm{V}})( fraktur_E , fraktur_D ) ∈ roman_Enc start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ( bold_sansserif_A ; bold_sansserif_V ) achieving uniform decoding accuracy ϵitalic-ϵ\epsilonitalic_ϵ over 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A. Conversely, if (𝔈,𝔇)𝔈𝔇(\mathfrak{E},\mathfrak{D})( fraktur_E , fraktur_D ) is an encoder-decoder pair (2.9), (2.10) associated with a bit-encoded neural operator Φ:𝒟×Θ:Φ𝒟Θ\Phi:\mathcal{D}\times\Theta\to\mathbb{R}roman_Φ : caligraphic_D × roman_Θ → blackboard_R with |Θ|2BΘsuperscript2𝐵|\Theta|\leq 2^{B}| roman_Θ | ≤ 2 start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT, and if the following minimax approximation bound holds,

sup𝒢Lip1(𝒟)infθΘ𝒢Φ(;θ)𝗩ϵ,subscriptsupremum𝒢subscriptLip1𝒟subscriptinfimum𝜃Θsubscriptnorm𝒢Φ𝜃𝗩italic-ϵ\sup_{\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{D})}\inf_{\theta\in\Theta}\|% \mathcal{G}-\Phi({\,\cdot\,};\theta)\|_{\mathsf{\bm{V}}}\leq\epsilon,roman_sup start_POSTSUBSCRIPT caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) end_POSTSUBSCRIPT roman_inf start_POSTSUBSCRIPT italic_θ ∈ roman_Θ end_POSTSUBSCRIPT ∥ caligraphic_G - roman_Φ ( ⋅ ; italic_θ ) ∥ start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT ≤ italic_ϵ ,

this implies that B(Lip1(𝒟);ϵ)𝗩𝐵subscriptsubscriptLip1𝒟italic-ϵ𝗩B\geq\mathcal{H}(\mathrm{Lip}_{1}(\mathcal{D});\epsilon)_{\mathsf{\bm{V}}}italic_B ≥ caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT.

2.4. Information-theoretic minimax bounds

As a consequence of Proposition 2.7, we can derive a lower bound on the required number of bits B𝐵Bitalic_B to achieve the minimax bound (2.12) by estimating the entropy of Lip1(𝒟)𝗩subscriptLip1𝒟𝗩\mathrm{Lip}_{1}(\mathcal{D})\subset\mathsf{\bm{V}}roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) ⊂ bold_sansserif_V. As mentioned before, we will consider two settings, corresponding to uniform approximation of 𝒢𝒢\mathcal{G}caligraphic_G over a compact set 𝒦𝒦\mathcal{K}caligraphic_K (the setting 𝒟=𝒦𝒟𝒦\mathcal{D}=\mathcal{K}caligraphic_D = caligraphic_K) and approximation with respect to a Bochner Lp(μ)superscript𝐿𝑝𝜇L^{p}(\mu)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ )-norm for probability measure μ𝜇\muitalic_μ (the setting 𝒟=𝒳𝒟𝒳\mathcal{D}=\mathcal{X}caligraphic_D = caligraphic_X).

Uniform approximation

We now consider 𝒦𝒳𝒦𝒳\mathcal{K}\subset\mathcal{X}caligraphic_K ⊂ caligraphic_X a compact set of input functions, and operators belonging to Lip1(𝒦)C(𝒦)subscriptLip1𝒦𝐶𝒦\mathrm{Lip}_{1}(\mathcal{K})\subset C(\mathcal{K})roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) ⊂ italic_C ( caligraphic_K ) (cp. Setting 2.2). This corresponds to the choice 𝒟=𝒦𝒟𝒦\mathcal{D}=\mathcal{K}caligraphic_D = caligraphic_K, 𝗔=Lip1(𝒦)𝗔subscriptLip1𝒦\mathsf{\bm{A}}=\mathrm{Lip}_{1}(\mathcal{K})bold_sansserif_A = roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ), 𝗩=C(𝒦)𝗩𝐶𝒦\mathsf{\bm{V}}=C(\mathcal{K})bold_sansserif_V = italic_C ( caligraphic_K ), in the discussion of the previous section. We then have the following result:

Theorem 2.8.

Let 𝒳𝒳\mathcal{X}caligraphic_X be a Banach space. Let 𝒦𝒳𝒦𝒳\mathcal{K}\subset\mathcal{X}caligraphic_K ⊂ caligraphic_X be a compact set of input functions, and assume that the metric entropy of 𝒦𝒦\mathcal{K}caligraphic_K satisfies the lower bound, (𝒦;ϵ)𝒳cαϵ1/αsubscript𝒦italic-ϵ𝒳subscript𝑐𝛼superscriptitalic-ϵ1𝛼\mathcal{H}(\mathcal{K};\epsilon)_{\mathcal{X}}\geq c_{\alpha}\epsilon^{-1/\alpha}caligraphic_H ( caligraphic_K ; italic_ϵ ) start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ≥ italic_c start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_ϵ start_POSTSUPERSCRIPT - 1 / italic_α end_POSTSUPERSCRIPT for α>0𝛼0\alpha>0italic_α > 0. There exists a constant c>0𝑐0c>0italic_c > 0, independent of ϵitalic-ϵ\epsilonitalic_ϵ, such that the following holds: If Φ:𝒦×Θ:Φ𝒦Θ\Phi:\mathcal{K}\times\Theta\to\mathbb{R}roman_Φ : caligraphic_K × roman_Θ → blackboard_R is a quantized neural operator architecture, satisfying

sup𝒢Lip1(𝒦)infθΘ𝒢Φ(;θ)C(𝒦)ϵ.subscriptsupremum𝒢subscriptLip1𝒦subscriptinfimum𝜃Θsubscriptnorm𝒢Φ𝜃𝐶𝒦italic-ϵ\sup_{\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{K})}\inf_{\theta\in\Theta}\|% \mathcal{G}-\Phi({\,\cdot\,};\theta)\|_{C(\mathcal{K})}\leq\epsilon.roman_sup start_POSTSUBSCRIPT caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) end_POSTSUBSCRIPT roman_inf start_POSTSUBSCRIPT italic_θ ∈ roman_Θ end_POSTSUBSCRIPT ∥ caligraphic_G - roman_Φ ( ⋅ ; italic_θ ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≤ italic_ϵ .

and if |Θ|2BΘsuperscript2𝐵|\Theta|\leq 2^{B}| roman_Θ | ≤ 2 start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT, i.e. if the parameters of ΦΦ\Phiroman_Φ can be encoded by B𝐵Bitalic_B bits, then

Bexp(cϵ1/α).𝐵𝑐superscriptitalic-ϵ1𝛼B\geq\exp(c\epsilon^{-1/\alpha}).italic_B ≥ roman_exp ( italic_c italic_ϵ start_POSTSUPERSCRIPT - 1 / italic_α end_POSTSUPERSCRIPT ) .
Proof.

The claim follows from the relation between the minimax code-length and the metric entropy of Lip1(𝒦)C(𝒦)subscriptLip1𝒦𝐶𝒦\mathrm{Lip}_{1}(\mathcal{K})\subset C(\mathcal{K})roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) ⊂ italic_C ( caligraphic_K ), stated in the above Proposition 2.7, and the following general bound on (Lip1(𝒦),ϵ)C(𝒦)subscriptsubscriptLip1𝒦italic-ϵ𝐶𝒦\mathcal{H}(\mathrm{Lip}_{1}(\mathcal{K}),\epsilon)_{C(\mathcal{K})}caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) , italic_ϵ ) start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT:

(Lip1(𝒦);ϵ)C(𝒦)2(𝒦,6ϵ)𝒳.subscriptsubscriptLip1𝒦italic-ϵ𝐶𝒦superscript2subscript𝒦6italic-ϵ𝒳\mathcal{H}(\mathrm{Lip}_{1}(\mathcal{K});\epsilon)_{C(\mathcal{K})}\geq 2^{% \mathcal{H}(\mathcal{K},6\epsilon)_{\mathcal{X}}}.caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) ; italic_ϵ ) start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≥ 2 start_POSTSUPERSCRIPT caligraphic_H ( caligraphic_K , 6 italic_ϵ ) start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT end_POSTSUPERSCRIPT .

This bound will be shown in Section 3.2, Proposition 3.2. Assuming this bound, then by assumption on 𝒦𝒦\mathcal{K}caligraphic_K, we have 2(𝒦,6ϵ)𝒳exp(cϵ1/α)superscript2subscript𝒦6italic-ϵ𝒳𝑐superscriptitalic-ϵ1𝛼2^{\mathcal{H}(\mathcal{K},6\epsilon)_{\mathcal{X}}}\geq\exp(c\epsilon^{-1/% \alpha})2 start_POSTSUPERSCRIPT caligraphic_H ( caligraphic_K , 6 italic_ϵ ) start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ≥ roman_exp ( italic_c italic_ϵ start_POSTSUPERSCRIPT - 1 / italic_α end_POSTSUPERSCRIPT ) for constant c>0𝑐0c>0italic_c > 0. ∎

If 𝒳𝒳\mathcal{X}caligraphic_X is a function space, then compact subsets 𝒦𝒳𝒦𝒳\mathcal{K}\subset\mathcal{X}caligraphic_K ⊂ caligraphic_X are commonly defined by a smoothness constraint, and this partly motivates our assumption on 𝒦𝒦\mathcal{K}caligraphic_K in the last theorem. The following example is illustrative.

Example 2.9.

Let Dd𝐷superscript𝑑D\subset\mathbb{R}^{d}italic_D ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be a bounded domain. Let 𝒳=L2(D)𝒳superscript𝐿2𝐷\mathcal{X}=L^{2}(D)caligraphic_X = italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ). An example of the setting outlined above is the case of Lipschitz operators 𝒢:𝒦:𝒢𝒦\mathcal{G}:\mathcal{K}\to\mathbb{R}caligraphic_G : caligraphic_K → blackboard_R, with

𝒦={uHs(D)|uHs(D)C},𝒦conditional-set𝑢superscript𝐻𝑠𝐷subscriptnorm𝑢superscript𝐻𝑠𝐷𝐶\mathcal{K}={\left\{u\in H^{s}(D)\,\middle|\,\|u\|_{H^{s}(D)}\leq C\right\}},caligraphic_K = { italic_u ∈ italic_H start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( italic_D ) | ∥ italic_u ∥ start_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( italic_D ) end_POSTSUBSCRIPT ≤ italic_C } ,

defined by a Sobolev smoothness constraint for C,s>0𝐶𝑠0C,s>0italic_C , italic_s > 0. In this case, it is well-known that the metric entropy satisfies (𝒦;ϵ)𝒳ϵd/sgreater-than-or-equivalent-tosubscript𝒦italic-ϵ𝒳superscriptitalic-ϵ𝑑𝑠\mathcal{H}(\mathcal{K};\epsilon)_{\mathcal{X}}\gtrsim\epsilon^{-d/s}caligraphic_H ( caligraphic_K ; italic_ϵ ) start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ≳ italic_ϵ start_POSTSUPERSCRIPT - italic_d / italic_s end_POSTSUPERSCRIPT, i.e. the assumptions of Theorem 2.8 hold with α=s/d𝛼𝑠𝑑\alpha=s/ditalic_α = italic_s / italic_d.

Approximation in expectation

Another commonly studied setting concerns the approximation in expectation (cp. Setting 2.4). Here, we consider 1111-Lipschitz map**s 𝒢:𝒳:𝒢𝒳\mathcal{G}:\mathcal{X}\to\mathbb{R}caligraphic_G : caligraphic_X → blackboard_R defined on a separable Hilbert space 𝒳𝒳\mathcal{X}caligraphic_X. We fix a probability measure μ𝜇\muitalic_μ on 𝒳𝒳\mathcal{X}caligraphic_X and consider inputs as random draws uμsimilar-to𝑢𝜇u\sim\muitalic_u ∼ italic_μ. To derive quantitative lower bounds, we will need to make minimal structural assumptions on μ𝜇\muitalic_μ.

Assumption 2.10.

There exists an orthonormal basis e1,e2,subscript𝑒1subscript𝑒2e_{1},e_{2},\dotsitalic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … of 𝒳𝒳\mathcal{X}caligraphic_X, probability space (Ω,)Ω(\Omega,\mathbb{P})( roman_Ω , blackboard_P ) and summable coefficients λ1λ2subscript𝜆1subscript𝜆2\lambda_{1}\geq\lambda_{2}\geq\dotsitalic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ …, such that μ𝜇\muitalic_μ is the law of a random variable u:Ω𝒳:𝑢Ω𝒳u:\Omega\to\mathcal{X}italic_u : roman_Ω → caligraphic_X of the form,

(2.19) u(ω)=j=1λjZj(ω)ej,(ωΩ).𝑢𝜔superscriptsubscript𝑗1subscript𝜆𝑗subscript𝑍𝑗𝜔subscript𝑒𝑗𝜔Ω\displaystyle u(\omega)=\sum_{j=1}^{\infty}\sqrt{\lambda_{j}}Z_{j}(\omega)e_{j% },\quad(\omega\in\Omega).italic_u ( italic_ω ) = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_ω ) italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ( italic_ω ∈ roman_Ω ) .

where Zj:Ω:subscript𝑍𝑗ΩZ_{j}:\Omega\to\mathbb{R}italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : roman_Ω → blackboard_R are jointly independent random variables. We assume that the random variable Zjsubscript𝑍𝑗Z_{j}italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT satisfies 𝔼|Zj|2=1𝔼superscriptsubscript𝑍𝑗21\mathbb{E}|Z_{j}|^{2}=1blackboard_E | italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1, and has law Zjρj(z)dzsimilar-tosubscript𝑍𝑗subscript𝜌𝑗𝑧𝑑𝑧Z_{j}\sim\rho_{j}(z)\,dzitalic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∼ italic_ρ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_z ) italic_d italic_z for a probability density function ρj:+:subscript𝜌𝑗subscript\rho_{j}:\mathbb{R}\to\mathbb{R}_{+}italic_ρ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : blackboard_R → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT. We furthermore assume that there exists a constant L>0𝐿0L>0italic_L > 0, such that

(2.20) supjρjL()L,λ1L.formulae-sequencesubscriptsupremum𝑗subscriptnormsubscript𝜌𝑗superscript𝐿𝐿subscript𝜆1𝐿\displaystyle\sup_{j\in\mathbb{N}}\|\rho_{j}\|_{L^{\infty}(\mathbb{R})}\leq L,% \quad\sqrt{\lambda_{1}}\leq L.roman_sup start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT ∥ italic_ρ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( blackboard_R ) end_POSTSUBSCRIPT ≤ italic_L , square-root start_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ≤ italic_L .

A concrete, and widely considered, example satisfying Assumption 2.10 is the case of a Gaussian probability measure μ𝜇\muitalic_μ with prescribed mean and covariance operator. In this case, λjsubscript𝜆𝑗\lambda_{j}italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are the eigenvalues of the covariance operator, ejsubscript𝑒𝑗e_{j}italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT the corresponding eigenfunctions, and the random variables Zjρjsimilar-tosubscript𝑍𝑗subscript𝜌𝑗Z_{j}\sim\rho_{j}italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∼ italic_ρ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT have standard Gaussian distribution.

Theorem 2.11.

Let 𝒳𝒳\mathcal{X}caligraphic_X be a Banach space of input functions. Let μ𝒫(𝒳)𝜇𝒫𝒳\mu\in\mathcal{P}(\mathcal{X})italic_μ ∈ caligraphic_P ( caligraphic_X ) be a probability measure satisfying Assumption 2.10. Assume that the coefficients λjjαgreater-than-or-equivalent-tosubscript𝜆𝑗superscript𝑗𝛼\sqrt{\lambda_{j}}\gtrsim j^{-\alpha}square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ≳ italic_j start_POSTSUPERSCRIPT - italic_α end_POSTSUPERSCRIPT as j𝑗j\to\inftyitalic_j → ∞, where α>0𝛼0\alpha>0italic_α > 0. Then there exists a constant c>0𝑐0c>0italic_c > 0, independent of ϵitalic-ϵ\epsilonitalic_ϵ, such that the following holds: If Φ:𝒳×Θ:Φ𝒳Θ\Phi:\mathcal{X}\times\Theta\to\mathbb{R}roman_Φ : caligraphic_X × roman_Θ → blackboard_R is a quantized neural operator architecture, satisfying

sup𝒢Lip1(𝒦)infθΘ𝒢Φ(;θ)Lp(μ)ϵ.subscriptsupremum𝒢subscriptLip1𝒦subscriptinfimum𝜃Θsubscriptnorm𝒢Φ𝜃superscript𝐿𝑝𝜇italic-ϵ\sup_{\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{K})}\inf_{\theta\in\Theta}\|% \mathcal{G}-\Phi({\,\cdot\,};\theta)\|_{L^{p}(\mu)}\leq\epsilon.roman_sup start_POSTSUBSCRIPT caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) end_POSTSUBSCRIPT roman_inf start_POSTSUBSCRIPT italic_θ ∈ roman_Θ end_POSTSUBSCRIPT ∥ caligraphic_G - roman_Φ ( ⋅ ; italic_θ ) ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT ≤ italic_ϵ .

and if |Θ|2BΘsuperscript2𝐵|\Theta|\leq 2^{B}| roman_Θ | ≤ 2 start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT, i.e. if the parameters of ΦΦ\Phiroman_Φ can be encoded by B𝐵Bitalic_B bits, then

Bexp(cϵ1/(α+1)).𝐵𝑐superscriptitalic-ϵ1𝛼1B\geq\exp(c\epsilon^{-1/(\alpha+1)}).italic_B ≥ roman_exp ( italic_c italic_ϵ start_POSTSUPERSCRIPT - 1 / ( italic_α + 1 ) end_POSTSUPERSCRIPT ) .
Proof.

Similarly to the uniform case, the present claim again follows from the relation between the minimax code-length and the metric entropy of Lip1(𝒳)Lp(μ)subscriptLip1𝒳superscript𝐿𝑝𝜇\mathrm{Lip}_{1}(\mathcal{X})\subset L^{p}(\mu)roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) ⊂ italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) of Proposition 2.7, together with the following general bound on (Lip1(𝒳),ϵ)Lp(μ)subscriptsubscriptLip1𝒳italic-ϵsuperscript𝐿𝑝𝜇\mathcal{H}(\mathrm{Lip}_{1}(\mathcal{X}),\epsilon)_{L^{p}(\mu)}caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) , italic_ϵ ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT:

(Lip1(𝒳);ϵ)Lp(μ)exp(cϵ1/(α+1)).subscriptsubscriptLip1𝒳italic-ϵsuperscript𝐿𝑝𝜇𝑐superscriptitalic-ϵ1𝛼1\mathcal{H}(\mathrm{Lip}_{1}(\mathcal{X});\epsilon)_{L^{p}(\mu)}\geq\exp(c% \epsilon^{-1/(\alpha+1)}).caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) ; italic_ϵ ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT ≥ roman_exp ( italic_c italic_ϵ start_POSTSUPERSCRIPT - 1 / ( italic_α + 1 ) end_POSTSUPERSCRIPT ) .

This lower entropy bound will be derived in Section 3.3, Proposition 3.6. ∎

Thus, an exponential number of encoding bits is also needed in an Lp(μ)superscript𝐿𝑝𝜇L^{p}(\mu)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ )-setting. Theorem 2.11 shows that the approximation of Lipschitz operators in expectation is not “qualitatively” easier than uniform approximation of such operators over a compact set of input functions.

2.5. Approximation of generic Lipschitz operators

Theorems 2.8 and 2.11 show that operator learning architectures that can approximate arbitrary 1111-Lipschitz operators to accuracy ϵitalic-ϵ\epsilonitalic_ϵ have exponential memory requirements; any (bit-encoded) implementation of such an architecture will require a number of bits that is exponential in ϵ1superscriptitalic-ϵ1\epsilon^{-1}italic_ϵ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. The reason for this is that the space of Lipschitz operators is exponentially large in a fundamental information-theoretic sense quantified by the metric entropy.

However, this minimax bound applies to the approximation of the entire class Lip1(𝒟)subscriptLip1𝒟\mathrm{Lip}_{1}(\mathcal{D})roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) by a single architecture, and does not necessarily imply that it is impossible to approximate individual 𝒢Lip1(𝒟)𝒢subscriptLip1𝒟\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{D})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) efficiently. At first sight, it could appear that arguments based on the metric entropy cannot be used to gain any insight into this refined question; Indeed, if we fix individual 𝒢Lip1(𝒟)𝒢subscriptLip1𝒟\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{D})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ), then the metric entropy of the singleton-set 𝗔={𝒢}𝗔𝒢\mathsf{\bm{A}}=\{\mathcal{G}\}bold_sansserif_A = { caligraphic_G } is trivially =0absent0=0= 0, and the minimax code length (2.15) is =1absent1=1= 1 for any value of the accuracy ϵitalic-ϵ\epsilonitalic_ϵ, since the trivial decoder 𝔇()𝒢𝔇𝒢\mathfrak{D}({\,\cdot\,})\equiv\mathcal{G}fraktur_D ( ⋅ ) ≡ caligraphic_G reproduces 𝒢𝒢\mathcal{G}caligraphic_G exactly, with vanishing approximation error, ϵ=0italic-ϵ0\epsilon=0italic_ϵ = 0. Thus, while entropy arguments give insights into the (concurrent) approximation of the set Lip1(𝒟)subscriptLip1𝒟\mathrm{Lip}_{1}(\mathcal{D})roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ), they seemingly have no immediate implications for the approximation of individual 𝒢Lip1(𝒟)𝒢subscriptLip1𝒟\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{D})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ).

Despite these facts, the results below will show that a refined analysis based on the concept of metric entropy is nevertheless possible; in the uniform and Lpsuperscript𝐿𝑝L^{p}italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT-settings of the previous section, a fixed sequence of bit-encoded architectures {Φn}nsubscriptsubscriptΦ𝑛𝑛\{\Phi_{n}\}_{n\in\mathbb{N}}{ roman_Φ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT, with at most n𝑛nitalic_n bits, can approximate generic elements 𝒢Lip1(𝒟)𝒢subscriptLip1𝒟\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{D})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) at best at a logarithmic rate, Err(𝒢;Φn,Θn)log(n)γ\mathrm{Err}(\mathcal{G};\Phi_{n},\Theta_{n})\lesssim\log(n)^{-\gamma}roman_Err ( caligraphic_G ; roman_Φ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , roman_Θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≲ roman_log ( italic_n ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT for fixed γ>0𝛾0\gamma>0italic_γ > 0. Before stating our result, we briefly recall the notion of a generic element of a (compact) metric space (see Appendix A for further remarks, and [45, Chap. 8] for an in-depth discussion):

Definition 2.12 (Topologically generic properties).

Let (𝗔,d)𝗔𝑑(\mathsf{\bm{A}},d)( bold_sansserif_A , italic_d ) be a compact metric space. A subset 𝗥𝗔𝗥𝗔\mathsf{\bm{R}}\subset\mathsf{\bm{A}}bold_sansserif_R ⊂ bold_sansserif_A is called residual, if it is equal to a countable intersection of sets, each of whose interior is dense in 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A. The complement of a residual set is a meagre set. A property P𝑃Pitalic_P is called generic, if the set

𝗥:={𝒢𝗔|𝒢 satisfies P}𝗔,assign𝗥conditional-set𝒢𝗔𝒢 satisfies 𝑃𝗔\mathsf{\bm{R}}:={\left\{\mathcal{G}\in\mathsf{\bm{A}}\,\middle|\,\mathcal{G}% \text{ satisfies }P\right\}}\subset\mathsf{\bm{A}},bold_sansserif_R := { caligraphic_G ∈ bold_sansserif_A | caligraphic_G satisfies italic_P } ⊂ bold_sansserif_A ,

is residual.

Under the assumption that (𝗔,d)𝗔𝑑(\mathsf{\bm{A}},d)( bold_sansserif_A , italic_d ) is compact, the Baire category theorem (cp. Appendix A) implies that any residual set 𝗥𝗥\mathsf{\bm{R}}bold_sansserif_R is dense in 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A. Furthermore, the intersection 𝗥=j=1𝗥j𝗥superscriptsubscript𝑗1subscript𝗥𝑗\mathsf{\bm{R}}=\bigcap_{j=1}^{\infty}\mathsf{\bm{R}}_{j}bold_sansserif_R = ⋂ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT bold_sansserif_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT of countably many residual sets 𝗥1,𝗥2,subscript𝗥1subscript𝗥2\mathsf{\bm{R}}_{1},\mathsf{\bm{R}}_{2},\dotsbold_sansserif_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_sansserif_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … is itself residual, and hence still dense. In this sense, a topologically generic property is somewhat analogous to a property that holds with probability 1111 in a probabilistic sense. Thus, a generic property is often thought of as a property that is satisfied by “almost every” element of 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A.

We can now state our main results on the approximation of generic operators 𝒢Lip1(𝒟)𝒢subscriptLip1𝒟\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{D})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ). In the uniform setting (cp. Setting 2.2), we have:

Proposition 2.13 (Uniform approximation of generic operators).

Let 𝒳𝒳\mathcal{X}caligraphic_X be a Banach space of input functions. Let 𝒦𝒳𝒦𝒳\mathcal{K}\subset\mathcal{X}caligraphic_K ⊂ caligraphic_X be compact, and assume that the metric entropy (𝒦;ϵ)𝒳ϵ1/αgreater-than-or-equivalent-tosubscript𝒦italic-ϵ𝒳superscriptitalic-ϵ1𝛼\mathcal{H}(\mathcal{K};\epsilon)_{\mathcal{X}}\gtrsim\epsilon^{-1/\alpha}caligraphic_H ( caligraphic_K ; italic_ϵ ) start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ≳ italic_ϵ start_POSTSUPERSCRIPT - 1 / italic_α end_POSTSUPERSCRIPT for α>0𝛼0\alpha>0italic_α > 0. Let {Φn:𝒦×Θn}nsubscriptconditional-setsubscriptΦ𝑛𝒦subscriptΘ𝑛𝑛\{\Phi_{n}:\mathcal{K}\times\Theta_{n}\to\mathbb{R}\}_{n\in\mathbb{N}}{ roman_Φ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : caligraphic_K × roman_Θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → blackboard_R } start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be a sequence of bit-encoded neural operator architectures, with quantized parameter set |Θn|2nsubscriptΘ𝑛superscript2𝑛|\Theta_{n}|\leq 2^{n}| roman_Θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | ≤ 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Then generic 𝒢Lip1(𝒦)𝒢subscriptLip1𝒦\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{K})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) cannot be approximated by {Φn}subscriptΦ𝑛\{\Phi_{n}\}{ roman_Φ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } at a convergence rate better than log(n)α\log(n)^{-\alpha}roman_log ( italic_n ) start_POSTSUPERSCRIPT - italic_α end_POSTSUPERSCRIPT; more precisely, for any sequence ϵn=o(log(n)α)\epsilon_{n}=o(\log(n)^{-\alpha})italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_o ( roman_log ( italic_n ) start_POSTSUPERSCRIPT - italic_α end_POSTSUPERSCRIPT ), there is a residual subset 𝗥Lip1(𝒦)𝗥subscriptLip1𝒦\mathsf{\bm{R}}\subset\mathrm{Lip}_{1}(\mathcal{K})bold_sansserif_R ⊂ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ), consisting of operators 𝒢𝗥𝒢𝗥\mathcal{G}\in\mathsf{\bm{R}}caligraphic_G ∈ bold_sansserif_R, for which

infθΘn𝒢Φn(;θ)C(𝒦)O(ϵn),(n).subscriptinfimum𝜃subscriptΘ𝑛subscriptnorm𝒢subscriptΦ𝑛𝜃𝐶𝒦𝑂subscriptitalic-ϵ𝑛𝑛\inf_{\theta\in\Theta_{n}}\|\mathcal{G}-\Phi_{n}({\,\cdot\,};\theta)\|_{C(% \mathcal{K})}\not=O(\epsilon_{n}),\quad(n\to\infty).roman_inf start_POSTSUBSCRIPT italic_θ ∈ roman_Θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ caligraphic_G - roman_Φ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ ; italic_θ ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≠ italic_O ( italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , ( italic_n → ∞ ) .

.

Proof.

We let 𝗩:=C(𝒦)assign𝗩𝐶𝒦\mathsf{\bm{V}}:=C(\mathcal{K})bold_sansserif_V := italic_C ( caligraphic_K ) and 𝗔:=Lip1(𝒦)assign𝗔subscriptLip1𝒦\mathsf{\bm{A}}:=\mathrm{Lip}_{1}(\mathcal{K})bold_sansserif_A := roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ). We note that 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V is a compact, convex subset. We then consider the sequence of subsets ΣnC(𝒦)subscriptΣ𝑛𝐶𝒦\Sigma_{n}\subset C(\mathcal{K})roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊂ italic_C ( caligraphic_K ), defined by all possible realizations,

Σn:={Φn(;θ)|θΘn}.assignsubscriptΣ𝑛conditional-setsubscriptΦ𝑛𝜃𝜃subscriptΘ𝑛\Sigma_{n}:={\left\{\Phi_{n}({\,\cdot\,};\theta)\,\middle|\,\theta\in\Theta_{n% }\right\}}.roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT := { roman_Φ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ ; italic_θ ) | italic_θ ∈ roman_Θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } .

By assumption, |Σn|=|Θn|2nsubscriptΣ𝑛subscriptΘ𝑛superscript2𝑛|\Sigma_{n}|=|\Theta_{n}|\leq 2^{n}| roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | = | roman_Θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | ≤ 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. By Proposition 3.2, to be proved in Section 3.2, we have (𝗔,ϵ)𝗩exp(cϵ1/α)subscript𝗔italic-ϵ𝗩𝑐superscriptitalic-ϵ1𝛼\mathcal{H}(\mathsf{\bm{A}},\epsilon)_{\mathsf{\bm{V}}}\geq\exp(c\epsilon^{1/% \alpha})caligraphic_H ( bold_sansserif_A , italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT ≥ roman_exp ( italic_c italic_ϵ start_POSTSUPERSCRIPT 1 / italic_α end_POSTSUPERSCRIPT ). The claim of Proposition 2.13 then follows, as a special case, from the abstract result of Proposition 4.2 to be derived in Section 4. ∎

A similar result holds for approximation of Lipschitz operators in an Lp(μ)superscript𝐿𝑝𝜇L^{p}(\mu)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) sense, as shown in the following proposition (cp. Setting 2.4):

Proposition 2.14 (Approximation of generic operators in expectation).

Let 𝒳𝒳\mathcal{X}caligraphic_X be a Banach space of input functions. Let μ𝒫(𝒳)𝜇𝒫𝒳\mu\in\mathcal{P}(\mathcal{X})italic_μ ∈ caligraphic_P ( caligraphic_X ) be a probability measure satisfying Assumption 2.10. Assume that the coefficients λjj2αgreater-than-or-equivalent-tosubscript𝜆𝑗superscript𝑗2𝛼\lambda_{j}\gtrsim j^{-2\alpha}italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≳ italic_j start_POSTSUPERSCRIPT - 2 italic_α end_POSTSUPERSCRIPT as j𝑗j\to\inftyitalic_j → ∞, where α>0𝛼0\alpha>0italic_α > 0. Let {Φn:𝒳×Θn}nsubscriptconditional-setsubscriptΦ𝑛𝒳subscriptΘ𝑛𝑛\{\Phi_{n}:\mathcal{X}\times\Theta_{n}\to\mathbb{R}\}_{n\in\mathbb{N}}{ roman_Φ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : caligraphic_X × roman_Θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → blackboard_R } start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be a sequence of bit-encoded neural operator architectures, with quantized parameter set |Θn|2nsubscriptΘ𝑛superscript2𝑛|\Theta_{n}|\leq 2^{n}| roman_Θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | ≤ 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Then generic 𝒢Lip1(𝒳)𝒢subscriptLip1𝒳\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{X})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) cannot be approximated by {Φn}subscriptΦ𝑛\{\Phi_{n}\}{ roman_Φ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } at a convergence rate better than log(n)(α+1)\log(n)^{-(\alpha+1)}roman_log ( italic_n ) start_POSTSUPERSCRIPT - ( italic_α + 1 ) end_POSTSUPERSCRIPT; more precisely, for any sequence ϵn=o(log(n)(α+1))\epsilon_{n}=o(\log(n)^{-(\alpha+1)})italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_o ( roman_log ( italic_n ) start_POSTSUPERSCRIPT - ( italic_α + 1 ) end_POSTSUPERSCRIPT ), there is a residual subset 𝗥Lip1(𝒳)𝗥subscriptLip1𝒳\mathsf{\bm{R}}\subset\mathrm{Lip}_{1}(\mathcal{X})bold_sansserif_R ⊂ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ), such that for any 𝒢𝗥𝒢𝗥\mathcal{G}\in\mathsf{\bm{R}}caligraphic_G ∈ bold_sansserif_R,

infθΘn𝒢Φn(;θ)Lp(μ)O(ϵn),(n).subscriptinfimum𝜃subscriptΘ𝑛subscriptnorm𝒢subscriptΦ𝑛𝜃superscript𝐿𝑝𝜇𝑂subscriptitalic-ϵ𝑛𝑛\inf_{\theta\in\Theta_{n}}\|\mathcal{G}-\Phi_{n}({\,\cdot\,};\theta)\|_{L^{p}(% \mu)}\not=O(\epsilon_{n}),\quad(n\to\infty).roman_inf start_POSTSUBSCRIPT italic_θ ∈ roman_Θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ caligraphic_G - roman_Φ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ ; italic_θ ) ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT ≠ italic_O ( italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , ( italic_n → ∞ ) .

.

Proof.

We let 𝗩:=Lp(μ)assign𝗩superscript𝐿𝑝𝜇\mathsf{\bm{V}}:=L^{p}(\mu)bold_sansserif_V := italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) and 𝗔:=Lip1(𝒳)assign𝗔subscriptLip1𝒳\mathsf{\bm{A}}:=\mathrm{Lip}_{1}(\mathcal{X})bold_sansserif_A := roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ). We note that 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V is a compact, convex subset. We consider the subsets Σn𝗩subscriptΣ𝑛𝗩\Sigma_{n}\subset\mathsf{\bm{V}}roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊂ bold_sansserif_V, defined by all possible realizations,

Σn:={Φn(;θ)|θΘn}.assignsubscriptΣ𝑛conditional-setsubscriptΦ𝑛𝜃𝜃subscriptΘ𝑛\Sigma_{n}:={\left\{\Phi_{n}({\,\cdot\,};\theta)\,\middle|\,\theta\in\Theta_{n% }\right\}}.roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT := { roman_Φ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ ; italic_θ ) | italic_θ ∈ roman_Θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } .

By assumption, |Σn|=|Θn|2nsubscriptΣ𝑛subscriptΘ𝑛superscript2𝑛|\Sigma_{n}|=|\Theta_{n}|\leq 2^{n}| roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | = | roman_Θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | ≤ 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. By Proposition 3.6, to be proved in Section 3.3, we have (𝗔,ϵ)𝗩exp(cϵ1/(α+1))subscript𝗔italic-ϵ𝗩𝑐superscriptitalic-ϵ1𝛼1\mathcal{H}(\mathsf{\bm{A}},\epsilon)_{\mathsf{\bm{V}}}\geq\exp(c\epsilon^{-1/% (\alpha+1)})caligraphic_H ( bold_sansserif_A , italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT ≥ roman_exp ( italic_c italic_ϵ start_POSTSUPERSCRIPT - 1 / ( italic_α + 1 ) end_POSTSUPERSCRIPT ). The claim of Proposition 2.14 then follows, as a special case, from the abstract result of Proposition 4.2 to be derived in Section 4. ∎

Remark 2.15.

The notion of a residual subset 𝗥Lip1(𝒟)𝗥subscriptLip1𝒟\mathsf{\bm{R}}\subset\mathrm{Lip}_{1}(\mathcal{D})bold_sansserif_R ⊂ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ) in Proposition 2.13 and 2.14 is to be understood with respect to the subspace topology on Lip1(𝒟)subscriptLip1𝒟\mathrm{Lip}_{1}(\mathcal{D})roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_D ), induced by the C(𝒦)𝐶𝒦C(\mathcal{K})italic_C ( caligraphic_K ) and Lp(μ)superscript𝐿𝑝𝜇L^{p}(\mu)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ )-norms, respectively.

2.6. Approximation of generic Lipschitz operators by FNO

The results of the previous section are formulated abstractly for an unspecified sequence of quantized neural operator architectures {Φn}subscriptΦ𝑛\{\Phi_{n}\}{ roman_Φ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }. To conclude the discussion of our main results, we illustrate some implications of these results for a concrete operator learning framework, the Fourier neural operator [39].

We note that although the derivation of these results will rely on Propositions 2.13 and 2.14, the ultimate statement of the theorems will be in terms of the number of tunable real-valued parameters of FNO, without bit-encoding. Thus, the gap between the bit-encoded parameters and real-valued parameters point of view can be bridged in this case.

In preparation to stating these theorems for FNO, we briefly describe a specific setting to which FNO is applicable, and recall the FNO architecture. This is followed by the statement of a novel theorem establishing a curse of (exponential) parametric complexity for the FNO, in the uniform approximation setting.

FNO case study

As a case study, we consider Fourier neural operators (FNO), approximating a relevant class of 1111-Lipschitz operators,

𝒢:𝒦L2(D;din),:𝒢𝒦superscript𝐿2𝐷superscriptsubscript𝑑in\mathcal{G}:\mathcal{K}\subset L^{2}(D;\mathbb{R}^{d_{\mathrm{in}}})\to\mathbb% {R},caligraphic_G : caligraphic_K ⊂ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ; blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) → blackboard_R ,

map** square-integrable input functions to the reals (or equivalently, to a space of constant-valued functions). Here 𝒦𝒦\mathcal{K}caligraphic_K is a compact subset of L2(D;din)superscript𝐿2𝐷superscriptsubscript𝑑inL^{2}(D;\mathbb{R}^{d_{\mathrm{in}}})italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ; blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ), consisting of square-integrable functions u:Ddin:𝑢𝐷superscriptsubscript𝑑inu:D\to\mathbb{R}^{d_{\mathrm{in}}}italic_u : italic_D → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. We wish to approximate such 1111-Lipschitz operator 𝒢𝒢\mathcal{G}caligraphic_G, uniformly over the compact set 𝒦𝒦\mathcal{K}caligraphic_K.

In the following, we will usually write L2(D)superscript𝐿2𝐷L^{2}(D)italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ) instead of L2(D;in)superscript𝐿2𝐷superscriptinL^{2}(D;\mathbb{R}^{\mathrm{in}})italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ; blackboard_R start_POSTSUPERSCRIPT roman_in end_POSTSUPERSCRIPT ), where for simplicity and due to certain restrictions of the FNO architecture, the underlying domain D=𝕋d𝐷superscript𝕋𝑑D=\mathbb{T}^{d}italic_D = blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is taken to be the 1111-periodic torus 𝕋d[0,1]dsimilar-to-or-equalssuperscript𝕋𝑑superscript01𝑑\mathbb{T}^{d}\simeq[0,1]^{d}blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ≃ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT in d𝑑ditalic_d spatial dimensions, where in typical applications, d{1,2,3}𝑑123d\in\{1,2,3\}italic_d ∈ { 1 , 2 , 3 }. Prototpyical examples of relevant 𝒦𝒦\mathcal{K}caligraphic_K are 𝒦=𝒰(Hs(𝕋d))𝒦𝒰superscript𝐻𝑠superscript𝕋𝑑\mathcal{K}=\mathcal{U}(H^{s}(\mathbb{T}^{d}))caligraphic_K = caligraphic_U ( italic_H start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ), where

𝒰(Hs(𝕋d))={uHs(𝕋d)|uHs1},𝒰superscript𝐻𝑠superscript𝕋𝑑conditional-set𝑢superscript𝐻𝑠superscript𝕋𝑑subscriptnorm𝑢superscript𝐻𝑠1\mathcal{U}(H^{s}(\mathbb{T}^{d}))={\left\{u\in H^{s}(\mathbb{T}^{d})\,\middle% |\,\|u\|_{H^{s}}\leq 1\right\}},caligraphic_U ( italic_H start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ) = { italic_u ∈ italic_H start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) | ∥ italic_u ∥ start_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ 1 } ,

denotes the unit ball in the Sobolev space Hs(𝕋d)superscript𝐻𝑠superscript𝕋𝑑H^{s}(\mathbb{T}^{d})italic_H start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) with smoothness s>0𝑠0s>0italic_s > 0. The question to be addressed is how many tunable parameters q𝑞qitalic_q are needed to approximate generic 𝒢Lip1(𝒦)L2(D)𝒢subscriptLip1subscript𝒦superscript𝐿2𝐷\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{K})_{L^{2}(D)}caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ) end_POSTSUBSCRIPT to a prescribed accuracy ϵitalic-ϵ\epsilonitalic_ϵ?

FNO architecture

We here recall the general notion of Fourier neural operators [39]. Let 𝒳=𝒳(D;din)𝒳𝒳𝐷superscriptsubscript𝑑in\mathcal{X}=\mathcal{X}(D;\mathbb{R}^{d_{\mathrm{in}}})caligraphic_X = caligraphic_X ( italic_D ; blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) and 𝒴=𝒴(D;dout)𝒴𝒴𝐷superscriptsubscript𝑑out\mathcal{Y}=\mathcal{Y}(D;\mathbb{R}^{d_{\mathrm{out}}})caligraphic_Y = caligraphic_Y ( italic_D ; blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) be two Banach function spaces, consisting of functions u:Ddin:𝑢𝐷superscriptsubscript𝑑inu:D\to\mathbb{R}^{d_{\mathrm{in}}}italic_u : italic_D → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and w:Ddout:𝑤𝐷superscriptsubscript𝑑outw:D\to\mathbb{R}^{d_{\mathrm{out}}}italic_w : italic_D → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, respectively. A Fourier neural operator (FNO) defines a nonlinear operator

ΦFNO:𝒳(D;din)𝒴(D;dout),:subscriptΦFNO𝒳𝐷superscriptsubscript𝑑in𝒴𝐷superscriptsubscript𝑑out\Phi_{\mathrm{FNO}}:\mathcal{X}(D;\mathbb{R}^{d_{\mathrm{in}}})\to\mathcal{Y}(% D;\mathbb{R}^{d_{\mathrm{out}}}),roman_Φ start_POSTSUBSCRIPT roman_FNO end_POSTSUBSCRIPT : caligraphic_X ( italic_D ; blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) → caligraphic_Y ( italic_D ; blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ,

map** between these spaces. By definition of the FNO architecture, such ΦFNOsubscriptΦFNO\Phi_{\mathrm{FNO}}roman_Φ start_POSTSUBSCRIPT roman_FNO end_POSTSUBSCRIPT takes the form

(2.21) ΦFNO(u;θ)=QL1P(u).subscriptΦFNO𝑢𝜃𝑄subscript𝐿subscript1𝑃𝑢\displaystyle\Phi_{\mathrm{FNO}}(u;\theta)=Q\circ\mathcal{L}_{L}\circ\dots% \circ\mathcal{L}_{1}\circ P(u).roman_Φ start_POSTSUBSCRIPT roman_FNO end_POSTSUBSCRIPT ( italic_u ; italic_θ ) = italic_Q ∘ caligraphic_L start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ∘ ⋯ ∘ caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∘ italic_P ( italic_u ) .

where P:𝒳𝒱:𝑃𝒳𝒱P:\mathcal{X}\to\mathcal{V}italic_P : caligraphic_X → caligraphic_V, u(x)Pu(x)maps-to𝑢𝑥𝑃𝑢𝑥u(x)\mapsto Pu(x)italic_u ( italic_x ) ↦ italic_P italic_u ( italic_x ) is a linear lifting layer, Q:𝒱𝒴:𝑄𝒱𝒴Q:\mathcal{V}\to\mathcal{Y}italic_Q : caligraphic_V → caligraphic_Y, v(x)Qv(x)maps-to𝑣𝑥𝑄𝑣𝑥v(x)\mapsto Qv(x)italic_v ( italic_x ) ↦ italic_Q italic_v ( italic_x ) is a linear projection layer, and the :𝒱(D;dc)𝒱(D;dc):subscript𝒱𝐷superscriptsubscript𝑑𝑐𝒱𝐷superscriptsubscript𝑑𝑐\mathcal{L}_{\ell}:\mathcal{V}(D;\mathbb{R}^{{d_{c}}})\to\mathcal{V}(D;\mathbb% {R}^{{d_{c}}})caligraphic_L start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT : caligraphic_V ( italic_D ; blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) → caligraphic_V ( italic_D ; blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) are the hidden layers, map** between hidden states v(v)𝒱(D;dc)maps-to𝑣subscript𝑣𝒱𝐷superscriptsubscript𝑑𝑐v\mapsto\mathcal{L}_{\ell}(v)\in\mathcal{V}(D;\mathbb{R}^{{d_{c}}})italic_v ↦ caligraphic_L start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_v ) ∈ caligraphic_V ( italic_D ; blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ). The hidden states are vector-valued functions with dcsubscript𝑑𝑐{d_{c}}italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT components, v:Ddc:𝑣𝐷superscriptsubscript𝑑𝑐v:D\to\mathbb{R}^{{d_{c}}}italic_v : italic_D → blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, belonging to a Banach function space 𝒱(D;dc)𝒱𝐷superscriptsubscript𝑑𝑐\mathcal{V}(D;\mathbb{R}^{d_{c}})caligraphic_V ( italic_D ; blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ). Here, the “channel width” dcsubscript𝑑𝑐{d_{c}}italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is a hyperparameter of the architecture. Each hidden layer subscript\mathcal{L}_{\ell}caligraphic_L start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT is of the form

(v)(x):=σ(Wv(x)+Kv(x)+b)assignsubscript𝑣𝑥𝜎𝑊𝑣𝑥𝐾𝑣𝑥𝑏\mathcal{L}_{\ell}(v)(x):=\sigma\big{(}Wv(x)+Kv(x)+b\big{)}caligraphic_L start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_v ) ( italic_x ) := italic_σ ( italic_W italic_v ( italic_x ) + italic_K italic_v ( italic_x ) + italic_b )

where Wdc×dc𝑊superscriptsubscript𝑑𝑐subscript𝑑𝑐W\in\mathbb{R}^{{d_{c}}\times{d_{c}}}italic_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is a matrix multiplying v(x)𝑣𝑥v(x)italic_v ( italic_x ) pointwise, and bdc𝑏superscriptsubscript𝑑𝑐b\in\mathbb{R}^{d_{c}}italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is a bias. K𝐾Kitalic_K is a non-local operator of the form

v(x)(Kv)(x):=1(P^kv(k))(x),maps-to𝑣𝑥𝐾𝑣𝑥assignsuperscript1subscript^𝑃𝑘𝑣𝑘𝑥v(x)\mapsto(Kv)(x):=\mathcal{F}^{-1}\big{(}\widehat{P}_{k}\mathcal{F}v(k)\big{% )}(x),italic_v ( italic_x ) ↦ ( italic_K italic_v ) ( italic_x ) := caligraphic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT caligraphic_F italic_v ( italic_k ) ) ( italic_x ) ,

with \mathcal{F}caligraphic_F (and 1superscript1\mathcal{F}^{-1}caligraphic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT) the Fourier transform (and its inverse). The matrix P^kdc×dcsubscript^𝑃𝑘superscriptsubscript𝑑𝑐subscript𝑑𝑐\widehat{P}_{k}\in\mathbb{C}^{{d_{c}}\times{d_{c}}}over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is a tunable Fourier multiplier indexed by kd𝑘superscript𝑑k\in\mathbb{Z}^{d}italic_k ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. It is assumed that P^k0subscript^𝑃𝑘0\widehat{P}_{k}\equiv 0over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≡ 0 for |k|κsubscript𝑘superscript𝜅|k|_{\ell^{\infty}}\geq\kappa| italic_k | start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≥ italic_κ, i.e. for wavenumbers k𝑘kitalic_k above a specified Fourier cut-off parameter κ𝜅\kappaitalic_κ. This Fourier cut-off κ𝜅\kappaitalic_κ is a second hyperparameter of the FNO architecture. We collect the values for different kd𝑘superscript𝑑k\in\mathbb{Z}^{d}italic_k ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, |k|<κsubscript𝑘superscript𝜅|k|_{\ell^{\infty}}<\kappa| italic_k | start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT < italic_κ, in a tensor P^={P^k}|k|<κ(2κ1)d×dc×dc^𝑃subscriptsubscript^𝑃𝑘subscript𝑘superscript𝜅superscriptsuperscript2𝜅1𝑑subscript𝑑𝑐subscript𝑑𝑐\widehat{P}=\{\widehat{P}_{k}\}_{|k|_{\ell^{\infty}}<\kappa}\in\mathbb{C}^{(2% \kappa-1)^{d}\times{d_{c}}\times{d_{c}}}over^ start_ARG italic_P end_ARG = { over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT | italic_k | start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT < italic_κ end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT ( 2 italic_κ - 1 ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, which acts on the Fourier coefficients v^(k)=(v)(k)^𝑣𝑘𝑣𝑘\widehat{v}(k)=\mathcal{F}(v)(k)over^ start_ARG italic_v end_ARG ( italic_k ) = caligraphic_F ( italic_v ) ( italic_k ), by

(P^v^)(k)i:=j=1dcP^k,ijv^(k),(kd,|k|<κ).assign^𝑃^𝑣subscript𝑘𝑖superscriptsubscript𝑗1subscript𝑑𝑐subscript^𝑃𝑘𝑖𝑗^𝑣𝑘formulae-sequence𝑘superscript𝑑subscript𝑘superscript𝜅(\widehat{P}\widehat{v})(k)_{i}:=\sum_{j=1}^{d_{c}}\widehat{P}_{k,ij}\widehat{% v}(k),\quad(k\in\mathbb{Z}^{d},\;|k|_{\ell^{\infty}}<\kappa).( over^ start_ARG italic_P end_ARG over^ start_ARG italic_v end_ARG ) ( italic_k ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_k , italic_i italic_j end_POSTSUBSCRIPT over^ start_ARG italic_v end_ARG ( italic_k ) , ( italic_k ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , | italic_k | start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT < italic_κ ) .

The resulting FNO architecture depends on the channel width dcsubscript𝑑𝑐{d_{c}}italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, Fourier cut-off parameter κ𝜅\kappaitalic_κ and depth L𝐿Litalic_L. We collect all tunable parameters in a vector θq𝜃superscript𝑞\theta\in\mathbb{R}^{q}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT. Any parameter θq𝜃superscript𝑞\theta\in\mathbb{R}^{q}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT can be decomposed layer-wise, as

θ=(θL+1,θL,,θ1,θ0),𝜃subscript𝜃𝐿1subscript𝜃𝐿subscript𝜃1subscript𝜃0\theta=(\theta_{L+1},\theta_{L},\dots,\theta_{1},\theta_{0}),italic_θ = ( italic_θ start_POSTSUBSCRIPT italic_L + 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , … , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ,

where

θ={Wij(),P^k,ij(),b^k()|i,j=1,,dc,|k|<κ,kd},subscript𝜃conditional-setsubscriptsuperscript𝑊𝑖𝑗superscriptsubscript^𝑃𝑘𝑖𝑗superscriptsubscript^𝑏𝑘formulae-sequence𝑖𝑗1subscript𝑑𝑐𝑘𝜅𝑘superscript𝑑\theta_{\ell}={\left\{W^{(\ell)}_{ij},\widehat{P}_{k,ij}^{(\ell)},\widehat{b}_% {k}^{(\ell)}\,\middle|\,i,j=1,\dots,{d_{c}},\,|k|<\kappa,\,k\in\mathbb{Z}^{d}% \right\}},italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = { italic_W start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_k , italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT , over^ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT | italic_i , italic_j = 1 , … , italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , | italic_k | < italic_κ , italic_k ∈ blackboard_Z start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT } ,

collects the parameters of the \ellroman_ℓ-th hidden layer, for 1L1𝐿1\leq\ell\leq L1 ≤ roman_ℓ ≤ italic_L. We denote by θ0={Pij|i,j=1,,dc}subscript𝜃0conditional-setsubscript𝑃𝑖𝑗formulae-sequence𝑖𝑗1subscript𝑑𝑐\theta_{0}={\left\{P_{ij}\,\middle|\,i,j=1,\dots,{d_{c}}\right\}}italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = { italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | italic_i , italic_j = 1 , … , italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT } the parameters of the projection P𝑃Pitalic_P and by θL+1={Qij|i,j=1,,dc}subscript𝜃𝐿1conditional-setsubscript𝑄𝑖𝑗formulae-sequence𝑖𝑗1subscript𝑑𝑐\theta_{L+1}={\left\{Q_{ij}\,\middle|\,i,j=1,\dots,{d_{c}}\right\}}italic_θ start_POSTSUBSCRIPT italic_L + 1 end_POSTSUBSCRIPT = { italic_Q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | italic_i , italic_j = 1 , … , italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT } the parameters of lifting Q𝑄Qitalic_Q. Assuming that din,doutdcsubscript𝑑insubscript𝑑outsubscript𝑑𝑐d_{\mathrm{in}},d_{\mathrm{out}}\leq d_{c}italic_d start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT ≤ italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, the dimension of θq𝜃superscript𝑞\theta\in\mathbb{R}^{q}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT satisfies,

(2.22) q=dcdin+L(dc2+(2κ)ddc2+dc)+dcdout5(2κ)dLdc25q.𝑞subscript𝑑𝑐subscript𝑑in𝐿superscriptsubscript𝑑𝑐2superscript2𝜅𝑑superscriptsubscript𝑑𝑐2subscript𝑑𝑐subscript𝑑𝑐subscript𝑑out5superscript2𝜅𝑑𝐿superscriptsubscript𝑑𝑐25𝑞\displaystyle q={d_{c}}d_{\mathrm{in}}+L({d_{c}}^{2}+(2\kappa)^{d}{d_{c}}^{2}+% {d_{c}})+{d_{c}}d_{\mathrm{out}}\leq 5(2\kappa)^{d}L{d_{c}}^{2}\leq 5q.italic_q = italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT + italic_L ( italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( 2 italic_κ ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) + italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT ≤ 5 ( 2 italic_κ ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_L italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 5 italic_q .

Consistent with practical implementations, it is generally assumed that the hidden channel dimension of the FNO is at least as large as both the input and output dimensions din,doutsubscript𝑑insubscript𝑑outd_{\mathrm{in}},d_{\mathrm{out}}italic_d start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT. We include a list of hyperparameters in Table 3 to aid clarify notation.

Remark 2.16.

Since we are interested in a restricted class of operators 𝒢:L2(D):𝒢superscript𝐿2𝐷\mathcal{G}:L^{2}(D)\to\mathbb{R}caligraphic_G : italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ) → blackboard_R, with real-valued outputs, we will replace the general output layer 𝒬:𝒱(D;dc)𝒴(D;dout):𝒬𝒱𝐷superscriptsubscript𝑑𝑐𝒴𝐷superscriptsubscript𝑑out\mathcal{Q}:\mathcal{V}(D;\mathbb{R}^{d_{c}})\to\mathcal{Y}(D;\mathbb{R}^{d_{% \mathrm{out}}})caligraphic_Q : caligraphic_V ( italic_D ; blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) → caligraphic_Y ( italic_D ; blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) by a spatially averaged, real-valued version 𝒬~:𝒱(D;dc):~𝒬𝒱𝐷superscriptsubscript𝑑𝑐\widetilde{\mathcal{Q}}:\mathcal{V}(D;\mathbb{R}^{d_{c}})\to\mathbb{R}over~ start_ARG caligraphic_Q end_ARG : caligraphic_V ( italic_D ; blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) → blackboard_R,

𝒬~v:=D𝒬v(x)𝑑x.assign~𝒬𝑣subscriptaverage-integral𝐷𝒬𝑣𝑥differential-d𝑥\widetilde{\mathcal{Q}}v:=\fint_{D}\mathcal{Q}v(x)\,dx.over~ start_ARG caligraphic_Q end_ARG italic_v := ⨏ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT caligraphic_Q italic_v ( italic_x ) italic_d italic_x .

This does not affect the parameter-count, while ensuring real-valued outputs. We will refer to this as an output-averaged FNO.

In passing and in connection with the last remark, we mention relevant work considering variants of FNO for finite-dimensional input and or output spaces [24], where similar alterations to the original FNO architecture have been studied in greater detail.

Symbol Meaning
dcsubscript𝑑𝑐{d_{c}}italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT channel width
κ𝜅\kappaitalic_κ Fourier cut-off
L𝐿Litalic_L depth
q𝑞qitalic_q total number of parameters
M𝑀Mitalic_M parameter bound, θMsubscriptnorm𝜃superscript𝑀\|\theta\|_{\ell^{\infty}}\leq M∥ italic_θ ∥ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_M
Table 3. Summary of (hyper-)parameters of the FNO architecture. This notation is used throughout this subsection.
Generic curse of parametric complexity for FNO

Our main theorem will be based on Proposition 2.13, and establishes a generic curse of parametric complexity for FNO. In contrast to the aforementioned proposition, this theorem holds at the level of continuous real-valued parameters θq𝜃superscript𝑞\theta\in\mathbb{R}^{q}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT, without requiring specification of a bit-encoding. Instead, we assume a mild bound on the parameters θq𝜃superscript𝑞\theta\in\mathbb{R}^{q}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT. We note that similar assumptions have been considered in the recent work [28], to define relevant approximation spaces of FNO. To this end, we make the following definition:

Definition 2.17.

Given an operator 𝒢:L2(D):𝒢superscript𝐿2𝐷\mathcal{G}:L^{2}(D)\to\mathbb{R}caligraphic_G : italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ) → blackboard_R and γ>0𝛾0\gamma>0italic_γ > 0, we will say that 𝒢𝒢\mathcal{G}caligraphic_G can be approximated by FNO at a logarithmic rate γ>0𝛾0\gamma>0italic_γ > 0, if there exists a sequence {Φq}qsubscriptsubscriptΦ𝑞𝑞\{\Phi_{q}\}_{q\in\mathbb{N}}{ roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_q ∈ blackboard_N end_POSTSUBSCRIPT of output-averaged FNO architectures Φq:L2(D)×q:subscriptΦ𝑞superscript𝐿2𝐷superscript𝑞\Phi_{q}:L^{2}(D)\times\mathbb{R}^{q}\to\mathbb{R}roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT : italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ) × blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT → blackboard_R with at most q𝑞qitalic_q tunable parameters, and a sequence of parameters θqqsubscript𝜃𝑞superscript𝑞\theta_{q}\in\mathbb{R}^{q}italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT, satisfying bound

θqexp(q),subscriptnormsubscript𝜃𝑞superscript𝑞\|\theta_{q}\|_{\ell^{\infty}}\leq\exp(q),∥ italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ roman_exp ( italic_q ) ,

and

𝒢Φq(;θq)C(𝒦)=O(log(q)γ),(q).\|\mathcal{G}-\Phi_{q}({\,\cdot\,};\theta_{q})\|_{C(\mathcal{K})}=O(\log(q)^{-% \gamma}),\quad(q\to\infty).∥ caligraphic_G - roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT = italic_O ( roman_log ( italic_q ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ) , ( italic_q → ∞ ) .
Remark 2.18.

The specific upper bound on the weights, θqexp(q)subscriptnormsubscript𝜃𝑞superscript𝑞\|\theta_{q}\|_{\ell^{\infty}}\leq\exp(q)∥ italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ roman_exp ( italic_q ), is here chosen for simplicity. For the following discussion, it could readily be replaced by a more general upper bound, θqc1exp(c2qc3)subscriptnormsubscript𝜃𝑞superscriptsubscript𝑐1subscript𝑐2superscript𝑞subscript𝑐3\|\theta_{q}\|_{\ell^{\infty}}\leq c_{1}\exp(c_{2}q^{c_{3}})∥ italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_exp ( italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) for fixed constants c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, c3subscript𝑐3c_{3}italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, without affecting the main conclusions.

We can now state our main result for FNO:

Theorem 2.19.

Let 𝒦L2(D)𝒦superscript𝐿2𝐷\mathcal{K}\subset L^{2}(D)caligraphic_K ⊂ italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ) be compact. Assume that the metric entropy of 𝒦𝒦\mathcal{K}caligraphic_K satisfies an algebraic lower bound, (𝒦;ϵ)L2(D)ϵ1/αgreater-than-or-equivalent-tosubscript𝒦italic-ϵsuperscript𝐿2𝐷superscriptitalic-ϵ1𝛼\mathcal{H}(\mathcal{K};\epsilon)_{L^{2}(D)}\gtrsim\epsilon^{-1/\alpha}caligraphic_H ( caligraphic_K ; italic_ϵ ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ) end_POSTSUBSCRIPT ≳ italic_ϵ start_POSTSUPERSCRIPT - 1 / italic_α end_POSTSUPERSCRIPT for some α>0𝛼0\alpha>0italic_α > 0. Consider FNO with a fixed Lipschitz continuous activation function σ𝜎\sigmaitalic_σ. Then generic 𝒢Lip1(𝒦)𝒢subscriptLip1𝒦\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{K})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) cannot be approximated by FNO at a logarithmic rate γ𝛾\gammaitalic_γ, for any γ>α𝛾𝛼\gamma>\alphaitalic_γ > italic_α.

Thus, loosely speaking and under mild growth assumptions on the weights, the approximation of generic 𝒢Lip1(𝒦)𝒢subscriptLip1𝒦\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{K})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) to accuracy ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, requires an FNO architecture with exponentially many tunable parameters in ϵ1superscriptitalic-ϵ1\epsilon^{-1}italic_ϵ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

The following corollary is obtained by taking 𝒦=𝒰(Hs(𝕋d))𝒦𝒰superscript𝐻𝑠superscript𝕋𝑑\mathcal{K}=\mathcal{U}(H^{s}(\mathbb{T}^{d}))caligraphic_K = caligraphic_U ( italic_H start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ) as the unit ball in a Sobolev space Hs(𝕋d)superscript𝐻𝑠superscript𝕋𝑑H^{s}(\mathbb{T}^{d})italic_H start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) for s>0𝑠0s>0italic_s > 0, and with 𝕋dsuperscript𝕋𝑑\mathbb{T}^{d}blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT the d𝑑ditalic_d-dimensional periodic torus:

Corollary 2.20.

Let s>0𝑠0s>0italic_s > 0, and denote 𝒦=𝒰(Hs(𝕋d))𝒦𝒰superscript𝐻𝑠superscript𝕋𝑑\mathcal{K}=\mathcal{U}(H^{s}(\mathbb{T}^{d}))caligraphic_K = caligraphic_U ( italic_H start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( blackboard_T start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ). Then generic 𝒢Lip1(𝒦)𝒢subscriptLip1𝒦\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{K})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) cannot be approximated by FNO at logarithmic rate γ𝛾\gammaitalic_γ, for any γ>s/d𝛾𝑠𝑑\gamma>s/ditalic_γ > italic_s / italic_d.

Proof of Theorem 2.19.

Fix γ>α𝛾𝛼\gamma>\alphaitalic_γ > italic_α. We wish to show that generic 𝒢Lip1(𝒦)𝒢subscriptLip1𝒦\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{K})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) cannot be approximated at logarithmic rate γ𝛾\gammaitalic_γ. Proof of this claim will make use of the following lemma:

Lemma 2.21 (FNO quantization lemma).

Fix Lipschitz continuous activation function σ𝜎\sigmaitalic_σ. Let γ>0𝛾0\gamma>0italic_γ > 0. For any q𝑞q\in\mathbb{N}italic_q ∈ blackboard_N, there exists a quantized neural operator Φ~nq:L2(D)×{0,1}nq:subscript~Φsubscript𝑛𝑞superscript𝐿2𝐷superscript01subscript𝑛𝑞\widetilde{\Phi}_{n_{q}}:L^{2}(D)\times\{0,1\}^{n_{q}}\to\mathbb{R}over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT : italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ) × { 0 , 1 } start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → blackboard_R with 2nqsuperscript2subscript𝑛𝑞2^{n_{q}}2 start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT quantized parameter values, where nqqmasymptotically-equalssubscript𝑛𝑞superscript𝑞𝑚n_{q}\asymp q^{m}italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ≍ italic_q start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, m=d+6𝑚𝑑6m=d+6italic_m = italic_d + 6, such that for any output-averaged FNO ΦqsubscriptΦ𝑞\Phi_{q}roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT with activation σ𝜎\sigmaitalic_σ and at most q𝑞qitalic_q tunable parameters, we have

supθ[Mq,Mq]qinf[θ]{0,1}nqΦq(;θ)Φ~nq(;[θ])C(𝒦)log(q)γ.\sup_{\theta\in{[-M_{q},M_{q}]}^{q}}\inf_{[\theta]\in\{0,1\}^{n_{q}}}\|\Phi_{q% }({\,\cdot\,};\theta)-\widetilde{\Phi}_{n_{q}}({\,\cdot\,};[\theta])\|_{C(% \mathcal{K})}\leq\log(q)^{-\gamma}.roman_sup start_POSTSUBSCRIPT italic_θ ∈ [ - italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_inf start_POSTSUBSCRIPT [ italic_θ ] ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ ) - over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ; [ italic_θ ] ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≤ roman_log ( italic_q ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT .

where Mq:=exp(q)assignsubscript𝑀𝑞𝑞M_{q}:=\exp(q)italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT := roman_exp ( italic_q ).

Brief sketch of proof.

The detailed proof of this lemma is included in Appendix B; in short, the proof relies on two observations: (i) all possible FNO architectures with at most q𝑞qitalic_q parameters can be encapsulated by a “super” FNO-architecture Φ^(;θ)^Φ𝜃\widehat{\Phi}({\,\cdot\,};\theta)over^ start_ARG roman_Φ end_ARG ( ⋅ ; italic_θ ) with a number of parameters that is bounded algebraically in q𝑞qitalic_q for fixed algebraic exponent, and (ii) quantization of this super-architecture with an algebraically bounded number of bits is possible, since the map** θΦ^(;θ)maps-to𝜃^Φ𝜃\theta\mapsto\widehat{\Phi}({\,\cdot\,};\theta)italic_θ ↦ over^ start_ARG roman_Φ end_ARG ( ⋅ ; italic_θ ) has at least a weak form of stability (Lipschitz continuity) over the relevant range of parameters θ𝜃\thetaitalic_θ, and a Lipschitz constant that grows at a sufficiently slow rate as a function of q𝑞qitalic_q. ∎

By Lemma 2.21, there exists m𝑚m\in\mathbb{N}italic_m ∈ blackboard_N, a sequence nqqmasymptotically-equalssubscript𝑛𝑞superscript𝑞𝑚n_{q}\asymp q^{m}italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ≍ italic_q start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, and a sequence of quantized neural operators, Φ~nq:L2(D)×{0,1}nq:subscript~Φsubscript𝑛𝑞superscript𝐿2𝐷superscript01subscript𝑛𝑞\widetilde{\Phi}_{n_{q}}:L^{2}(D)\times\{0,1\}^{n_{q}}\to\mathbb{R}over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT : italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ) × { 0 , 1 } start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → blackboard_R, such that

supθ[Mq,Mq]qinf[θ]{0,1}nqΦq(;θ)Φ~nq(;[θ])C(𝒦)log(q)γ.\sup_{\theta\in[-M_{q},M_{q}]^{q}}\inf_{[\theta]\in\{0,1\}^{n_{q}}}\|\Phi_{q}(% {\,\cdot\,};\theta)-\widetilde{\Phi}_{n_{q}}({\,\cdot\,};[\theta])\|_{C(% \mathcal{K})}\leq\log(q)^{-\gamma}.roman_sup start_POSTSUBSCRIPT italic_θ ∈ [ - italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_inf start_POSTSUBSCRIPT [ italic_θ ] ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ ) - over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ; [ italic_θ ] ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≤ roman_log ( italic_q ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT .

Associated with this subsequence nqsubscript𝑛𝑞n_{q}\to\inftyitalic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT → ∞, we now define an (abstact) sequence of bit-encoded neural operators for arbitrary n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N; specifically, we define Φ~n(;):L2(D)×{0,1}n:subscript~Φ𝑛superscript𝐿2𝐷superscript01𝑛\widetilde{\Phi}_{n}({\,\cdot\,};{\,\cdot\,}):L^{2}(D)\times\{0,1\}^{n}\to% \mathbb{R}over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ ; ⋅ ) : italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ) × { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R, by

Φ~n(;[θ]n):=Φ~nq(;[θ]nq),[θ]n{0,1}n,formulae-sequenceassignsubscript~Φ𝑛subscriptdelimited-[]𝜃𝑛subscript~Φsubscript𝑛𝑞subscriptdelimited-[]𝜃subscript𝑛𝑞subscriptdelimited-[]𝜃𝑛superscript01𝑛\widetilde{\Phi}_{n}({\,\cdot\,};[\theta]_{n}):=\widetilde{\Phi}_{n_{q}}({\,% \cdot\,};[\theta]_{n_{q}}),\quad[\theta]_{n}\in\{0,1\}^{n},over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ ; [ italic_θ ] start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) := over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ; [ italic_θ ] start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , [ italic_θ ] start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ,

where nqsubscript𝑛𝑞n_{q}italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT is chosen maximal such that nqnsubscript𝑛𝑞𝑛n_{q}\leq nitalic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ≤ italic_n, and [θ]nqsubscriptdelimited-[]𝜃subscript𝑛𝑞[\theta]_{n_{q}}[ italic_θ ] start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT are the first nqnsubscript𝑛𝑞𝑛n_{q}\leq nitalic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ≤ italic_n bits of [θ]nsubscriptdelimited-[]𝜃𝑛[\theta]_{n}[ italic_θ ] start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (the values of the remaining bits are simply ignored). We note that since nqqmasymptotically-equalssubscript𝑛𝑞superscript𝑞𝑚n_{q}\asymp q^{m}italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ≍ italic_q start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, we have log(q)γlog(nq)γ\log(q)^{-\gamma}\asymp\log(n_{q})^{-\gamma}roman_log ( italic_q ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ≍ roman_log ( italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT. Furthermore, for arbitrary fixed operator 𝒢𝒢\mathcal{G}caligraphic_G, we note that the decay

inf[θ]{0,1}nq𝒢Φ~nq(;[θ])C(𝒦)log(nq)γ,\displaystyle\inf_{[\theta]\in\{0,1\}^{n_{q}}}\|\mathcal{G}-\widetilde{\Phi}_{% n_{q}}({\,\cdot\,};[\theta])\|_{C(\mathcal{K})}\lesssim\log(n_{q})^{-\gamma},roman_inf start_POSTSUBSCRIPT [ italic_θ ] ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ caligraphic_G - over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ; [ italic_θ ] ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≲ roman_log ( italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ,

along the specified subsequence nqqmasymptotically-equalssubscript𝑛𝑞superscript𝑞𝑚n_{q}\asymp q^{m}italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ≍ italic_q start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT also implies the error decay

(2.23) inf[θ]{0,1}n𝒢Φ~n(;[θ])C(𝒦)log(n)γ,\displaystyle\inf_{[\theta]\in\{0,1\}^{n}}\|\mathcal{G}-\widetilde{\Phi}_{n}({% \,\cdot\,};[\theta])\|_{C(\mathcal{K})}\lesssim\log(n)^{-\gamma},roman_inf start_POSTSUBSCRIPT [ italic_θ ] ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ caligraphic_G - over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ ; [ italic_θ ] ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≲ roman_log ( italic_n ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ,

along the full sequence n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N, as n𝑛n\to\inftyitalic_n → ∞. This is immediate from the definition of Φ~nsubscript~Φ𝑛\widetilde{\Phi}_{n}over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and the fact that nqn<nq+1subscript𝑛𝑞𝑛subscript𝑛𝑞1n_{q}\leq n<n_{q+1}italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ≤ italic_n < italic_n start_POSTSUBSCRIPT italic_q + 1 end_POSTSUBSCRIPT does not leave exponential gaps between subsequent nqsubscript𝑛𝑞n_{q}italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, since 1nq+1/nq(q+1)m/qm=O(1)1subscript𝑛𝑞1subscript𝑛𝑞asymptotically-equalssuperscript𝑞1𝑚superscript𝑞𝑚𝑂11\leq n_{q+1}/n_{q}\asymp(q+1)^{m}/q^{m}=O(1)1 ≤ italic_n start_POSTSUBSCRIPT italic_q + 1 end_POSTSUBSCRIPT / italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ≍ ( italic_q + 1 ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT / italic_q start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT = italic_O ( 1 ); in particular, this implies that log(nq)log(nq+1)log(n)similar-tosubscript𝑛𝑞subscript𝑛𝑞1similar-to𝑛\log(n_{q})\sim\log(n_{q+1})\sim\log(n)roman_log ( italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) ∼ roman_log ( italic_n start_POSTSUBSCRIPT italic_q + 1 end_POSTSUBSCRIPT ) ∼ roman_log ( italic_n ).

By Proposition 2.13, the set of operators 𝗠Lip1(𝒦)𝗠subscriptLip1𝒦\mathsf{\bm{M}}\subset\mathrm{Lip}_{1}(\mathcal{K})bold_sansserif_M ⊂ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) which can be approximated by such a sequence {Φ~n}subscript~Φ𝑛\{\widetilde{\Phi}_{n}\}{ over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, at logarithmic rate γ𝛾\gammaitalic_γ, is meagre (its complement is residual). To conclude the argument, it therefore suffices to show that if 𝒢𝒢\mathcal{G}caligraphic_G can be approximated by FNO at logarithmic rate γ𝛾\gammaitalic_γ, then 𝒢𝗠𝒢𝗠\mathcal{G}\in\mathsf{\bm{M}}caligraphic_G ∈ bold_sansserif_M. This then implies that the set of operators that can be approximated by FNO at logarithmic rate γ𝛾\gammaitalic_γ is a subset of 𝗠𝗠\mathsf{\bm{M}}bold_sansserif_M, and hence is itself meagre.

To this end, assume that 𝒢Lip1(𝒦)𝒢subscriptLip1𝒦\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{K})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) is approximated by FNO at logarithmic rate γ𝛾\gammaitalic_γ. By definition, there exists a sequence of FNOs, Φq:L2(D)×q:subscriptΦ𝑞superscript𝐿2𝐷superscript𝑞\Phi_{q}:L^{2}(D)\times\mathbb{R}^{q}\to\mathbb{R}roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT : italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ) × blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT → blackboard_R, such that,

infθ[Mq,Mq]q𝒢Φq(;θ)C(𝒦)=O(log(q)γ).\inf_{\theta\in[-M_{q},M_{q}]^{q}}\|\mathcal{G}-\Phi_{q}({\,\cdot\,};\theta)\|% _{C(\mathcal{K})}=O(\log(q)^{-\gamma}).roman_inf start_POSTSUBSCRIPT italic_θ ∈ [ - italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ caligraphic_G - roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT = italic_O ( roman_log ( italic_q ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ) .

By the triangle inequality,

inf[θ]{0,1}nq𝒢Φ~nq(;[θ])C(𝒦)subscriptinfimumdelimited-[]𝜃superscript01subscript𝑛𝑞subscriptnorm𝒢subscript~Φsubscript𝑛𝑞delimited-[]𝜃𝐶𝒦\displaystyle\inf_{[\theta]\in\{0,1\}^{n_{q}}}\|\mathcal{G}-\widetilde{\Phi}_{% n_{q}}({\,\cdot\,};[\theta])\|_{C(\mathcal{K})}roman_inf start_POSTSUBSCRIPT [ italic_θ ] ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ caligraphic_G - over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ; [ italic_θ ] ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT
𝒢Φq(;θq)C(𝒦)+inf[θ]{0,1}nqΦq(;θq)Φ~nq(;[θ])C(𝒦)absentsubscriptnorm𝒢subscriptΦ𝑞subscript𝜃𝑞𝐶𝒦subscriptinfimumdelimited-[]𝜃superscript01subscript𝑛𝑞subscriptnormsubscriptΦ𝑞subscript𝜃𝑞subscript~Φsubscript𝑛𝑞delimited-[]𝜃𝐶𝒦\displaystyle\hskip 56.9055pt\leq\|\mathcal{G}-\Phi_{q}({\,\cdot\,};\theta_{q}% )\|_{C(\mathcal{K})}+\inf_{[\theta]\in\{0,1\}^{n_{q}}}\|\Phi_{q}({\,\cdot\,};% \theta_{q})-\widetilde{\Phi}_{n_{q}}({\,\cdot\,};[\theta])\|_{C(\mathcal{K})}≤ ∥ caligraphic_G - roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT + roman_inf start_POSTSUBSCRIPT [ italic_θ ] ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) - over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ; [ italic_θ ] ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT
O(log(q)γ)+O(log(nq)γ)=O(log(nq)γ),\displaystyle\hskip 56.9055pt\leq O(\log(q)^{-\gamma})+O(\log(n_{q})^{-\gamma}% )=O(\log(n_{q})^{-\gamma}),≤ italic_O ( roman_log ( italic_q ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ) + italic_O ( roman_log ( italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ) = italic_O ( roman_log ( italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ) ,

along the specified sequence nqsubscript𝑛𝑞n_{q}\to\inftyitalic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT → ∞. By (2.23), this implies that

𝒢Φ~n(;[θ])C(𝒦)=O(log(n)γ),\|\mathcal{G}-\widetilde{\Phi}_{n}({\,\cdot\,};[\theta])\|_{C(\mathcal{K})}=O(% \log(n)^{-\gamma}),∥ caligraphic_G - over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ ; [ italic_θ ] ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT = italic_O ( roman_log ( italic_n ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ) ,

along the entire sequence n𝑛n\to\inftyitalic_n → ∞, and hence 𝒢𝗠𝒢𝗠\mathcal{G}\in\mathsf{\bm{M}}caligraphic_G ∈ bold_sansserif_M, i.e. 𝒢𝒢\mathcal{G}caligraphic_G belongs to the meagre set of operators which can be approximated by the sequence {Φ~n}subscript~Φ𝑛\{\widetilde{\Phi}_{n}\}{ over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } at logarithmic rate γ𝛾\gammaitalic_γ.

We have shown that any operator 𝒢𝒢\mathcal{G}caligraphic_G that is approximated by FNO at logarithmic rate γ𝛾\gammaitalic_γ belongs to the meagre set 𝗠𝗠\mathsf{\bm{M}}bold_sansserif_M. Hence, the set of operators that is approximated by FNO at logarithmic rate γ𝛾\gammaitalic_γ is itself meagre, and its complement 𝗥=Lip1(𝒦)𝗠𝗥subscriptLip1𝒦𝗠\mathsf{\bm{R}}=\mathrm{Lip}_{1}(\mathcal{K})\setminus\mathsf{\bm{M}}bold_sansserif_R = roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) ∖ bold_sansserif_M is residual. We conclude that generic operators 𝒢Lip1(𝒦)𝒢subscriptLip1𝒦\mathcal{G}\in\mathrm{Lip}_{1}(\mathcal{K})caligraphic_G ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ), belonging to 𝗥𝗥\mathsf{\bm{R}}bold_sansserif_R, cannot be approximated at logarithmic rate γ>α𝛾𝛼\gamma>\alphaitalic_γ > italic_α. ∎

3. The metric entropy of Lipschitz operators

In the present section, we provide lower bounds on the metric entropy of Lipschitz operators in two general settings; the first pertains to the sup-norm over a compact set of inputs, the second is of relevance to the approximation with respect to the Bochner Lpsuperscript𝐿𝑝L^{p}italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT-norm with respect to a probability measure on the input space. After briefly recalling the relation between covering and packing numbers, we proceed to consider the sup-norm setting in Section 3.2 and the Lpsuperscript𝐿𝑝L^{p}italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT-setting in Section 3.3.

3.1. Entropy, covering and packing

We recall from Definition 2.6 that the metric entropy (𝗔;ϵ)𝗩subscript𝗔italic-ϵ𝗩\mathcal{H}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}caligraphic_H ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT of a subset 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V is defined by (𝗔;ϵ)𝗩=log2𝒩(𝗔;ϵ)𝗩subscript𝗔italic-ϵ𝗩subscript2𝒩subscript𝗔italic-ϵ𝗩\mathcal{H}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}=\log_{2}\mathcal{N}(% \mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}caligraphic_H ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT = roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_N ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT; here, 𝒩(𝗔;ϵ)𝗩𝒩subscript𝗔italic-ϵ𝗩\mathcal{N}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}caligraphic_N ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT denotes the covering number of 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A, which is defined as the smallest number of open balls needed to cover 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A. We also recall the closely related notion of a packing number:

Definition 3.1 (Packing number).

Let (𝗩,d)𝗩𝑑(\mathsf{\bm{V}},d)( bold_sansserif_V , italic_d ) be a metric space. The packing number of a subset 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V, denoted (𝗔;ϵ)𝗩subscript𝗔italic-ϵ𝗩\mathcal{M}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}caligraphic_M ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT, is the largest integer M𝑀M\in\mathbb{N}italic_M ∈ blackboard_N for which there exist elements u1,,uM𝗔subscript𝑢1subscript𝑢𝑀𝗔u_{1},\dots,u_{M}\in\mathsf{\bm{A}}italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ∈ bold_sansserif_A, with pairwise distance d(uj,uk)ϵ𝑑subscript𝑢𝑗subscript𝑢𝑘italic-ϵd(u_{j},u_{k})\geq\epsilonitalic_d ( italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≥ italic_ϵ, for all distinct j,k{1,,M}𝑗𝑘1𝑀j,k\in\{1,\dots,M\}italic_j , italic_k ∈ { 1 , … , italic_M }.

With our definitions, the following inequalities between covering and packing numbers are elementary: For any subset 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V, we have

(3.1) (𝗔;3ϵ)𝗩𝒩(𝗔;ϵ)𝗩(𝗔;ϵ)𝗩.subscript𝗔3italic-ϵ𝗩𝒩subscript𝗔italic-ϵ𝗩subscript𝗔italic-ϵ𝗩\displaystyle\mathcal{M}(\mathsf{\bm{A}};3\epsilon)_{\mathsf{\bm{V}}}\leq% \mathcal{N}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}\leq\mathcal{M}(\mathsf% {\bm{A}};\epsilon)_{\mathsf{\bm{V}}}.caligraphic_M ( bold_sansserif_A ; 3 italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT ≤ caligraphic_N ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT ≤ caligraphic_M ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT .

We mention that, if the covering number is defined by open balls, the factor 3333 in the first term could have been replaced by 2222. With our closed definition, any factor >2absent2>2> 2 would do – we here choose 3333 for simplicity.

3.2. Uniform approximation

We are here interested in the uniform setting (Setting 2.2), i.e. the unifrom approximation of a (real-valued) map** 𝒢:𝒦:𝒢𝒦\mathcal{G}:\mathcal{K}\to\mathbb{R}caligraphic_G : caligraphic_K → blackboard_R over a compact domain 𝒦𝒳𝒦𝒳\mathcal{K}\subset\mathcal{X}caligraphic_K ⊂ caligraphic_X.

As pointed out before, given the link between minimax code-length and metric entropy, we are interested in estimating the metric entropy of Lip1(𝒦)subscriptLip1𝒦\mathrm{Lip}_{1}(\mathcal{K})roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) for 𝒦𝒦\mathcal{K}caligraphic_K a compact metric space. The following proposition relates the metric entropy of Lip1(𝒦)𝗩subscriptLip1𝒦𝗩\mathrm{Lip}_{1}(\mathcal{K})\subset\mathsf{\bm{V}}roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) ⊂ bold_sansserif_V to that of 𝒦𝒦\mathcal{K}caligraphic_K, when 𝗩=C(𝒦)𝗩𝐶𝒦\mathsf{\bm{V}}=C(\mathcal{K})bold_sansserif_V = italic_C ( caligraphic_K ) is metrized by the sup-norm:

Proposition 3.2.

Let (𝒦,d)𝒦𝑑(\mathcal{K},d)( caligraphic_K , italic_d ) be a metric space. Let ϵ(0,1/3]italic-ϵ013\epsilon\in(0,1/3]italic_ϵ ∈ ( 0 , 1 / 3 ]. The metric entropy of Lip1(𝒦)C(𝒦)subscriptLip1𝒦𝐶𝒦\mathrm{Lip}_{1}(\mathcal{K})\subset C(\mathcal{K})roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) ⊂ italic_C ( caligraphic_K ) is lower bounded by

(3.2) (Lip1(𝒦),ϵ)C(𝒦)2(𝒦;6ϵ)𝒳.subscriptsubscriptLip1𝒦italic-ϵ𝐶𝒦superscript2subscript𝒦6italic-ϵ𝒳\displaystyle\mathcal{H}(\mathrm{Lip}_{1}(\mathcal{K}),\epsilon)_{C(\mathcal{K% })}\geq 2^{\mathcal{H}(\mathcal{K};6\epsilon)_{\mathcal{X}}}.caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) , italic_ϵ ) start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≥ 2 start_POSTSUPERSCRIPT caligraphic_H ( caligraphic_K ; 6 italic_ϵ ) start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT end_POSTSUPERSCRIPT .

Proposition 3.2 shows that the space of 1111-Lipschitz functions on a compact metric space has exponentially larger entropy than the underlying space.

Proof.

Let ϵ(0,1/3]italic-ϵ013\epsilon\in(0,1/3]italic_ϵ ∈ ( 0 , 1 / 3 ] be given. Let N=𝒩(𝒦;6ϵ)𝒳𝑁𝒩subscript𝒦6italic-ϵ𝒳N=\mathcal{N}(\mathcal{K};6\epsilon)_{\mathcal{X}}italic_N = caligraphic_N ( caligraphic_K ; 6 italic_ϵ ) start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT. Since the covering number lower bounds the packing number (cf. (3.1)), there exist N𝑁Nitalic_N elements u1,,uN𝒦subscript𝑢1subscript𝑢𝑁𝒦u_{1},\dots,u_{N}\in\mathcal{K}italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ caligraphic_K, with pairwise distance 6ϵabsent6italic-ϵ\geq 6\epsilon≥ 6 italic_ϵ. Let

ψj(u):=max(3ϵd(u,uj),0),j=1,,N,formulae-sequenceassignsubscript𝜓𝑗𝑢3italic-ϵ𝑑𝑢subscript𝑢𝑗0𝑗1𝑁\psi_{j}(u):=\max(3\epsilon-d(u,u_{j}),0),\quad j=1,\dots,N,italic_ψ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_u ) := roman_max ( 3 italic_ϵ - italic_d ( italic_u , italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , 0 ) , italic_j = 1 , … , italic_N ,

denote “hat” functions centered at ujsubscript𝑢𝑗u_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and non-vanishing only on B3ϵ(uj)𝒦subscript𝐵3italic-ϵsubscript𝑢𝑗𝒦B_{3\epsilon}(u_{j})\subset\mathcal{K}italic_B start_POSTSUBSCRIPT 3 italic_ϵ end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ⊂ caligraphic_K. We note that each ψjsubscript𝜓𝑗\psi_{j}italic_ψ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is 1111-Lipschitz, satisfies ψjC(𝒦)=3ϵsubscriptnormsubscript𝜓𝑗𝐶𝒦3italic-ϵ\|\psi_{j}\|_{C(\mathcal{K})}=3\epsilon∥ italic_ψ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT = 3 italic_ϵ, and the supports of ψjsubscript𝜓𝑗\psi_{j}italic_ψ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are essentially disjoint.

We now consider the set of Lipschitz functions f:𝒦:𝑓𝒦f:\mathcal{K}\to\mathbb{R}italic_f : caligraphic_K → blackboard_R of the form,

fσ(u)=j=1Nσjψj(u),σ=(σ1,,σN){0,1}N.formulae-sequencesubscript𝑓𝜎𝑢superscriptsubscript𝑗1𝑁subscript𝜎𝑗subscript𝜓𝑗𝑢𝜎subscript𝜎1subscript𝜎𝑁superscript01𝑁f_{\sigma}(u)=\sum_{j=1}^{N}\sigma_{j}\psi_{j}(u),\quad\sigma=(\sigma_{1},% \dots,\sigma_{N})\in\{0,1\}^{N}.italic_f start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( italic_u ) = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_u ) , italic_σ = ( italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_σ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT .

These functions satisfy fσC(𝒦)3ϵ1subscriptnormsubscript𝑓𝜎𝐶𝒦3italic-ϵ1\|f_{\sigma}\|_{C(\mathcal{K})}\leq 3\epsilon\leq 1∥ italic_f start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≤ 3 italic_ϵ ≤ 1, and Lip(fσ)maxj=1,,NLip(ψj)=1Lipsubscript𝑓𝜎subscript𝑗1𝑁Lipsubscript𝜓𝑗1\mathrm{Lip}(f_{\sigma})\leq\max_{j=1,\dots,N}\mathrm{Lip}(\psi_{j})=1roman_Lip ( italic_f start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) ≤ roman_max start_POSTSUBSCRIPT italic_j = 1 , … , italic_N end_POSTSUBSCRIPT roman_Lip ( italic_ψ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = 1, for all choices of σ𝜎\sigmaitalic_σ. Furthermore, if σ,σ{0,1}N𝜎superscript𝜎superscript01𝑁\sigma,\sigma^{\prime}\in\{0,1\}^{N}italic_σ , italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT are two distinct elements, say with σj0σj0subscript𝜎subscript𝑗0subscriptsuperscript𝜎subscript𝑗0\sigma_{j_{0}}\neq\sigma^{\prime}_{j_{0}}italic_σ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≠ italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, then it is straightforward to show that fσfσC(𝒦)ψj0C(𝒦)=3ϵsubscriptnormsubscript𝑓𝜎subscript𝑓superscript𝜎𝐶𝒦subscriptnormsubscript𝜓subscript𝑗0𝐶𝒦3italic-ϵ\|f_{\sigma}-f_{\sigma^{\prime}}\|_{C(\mathcal{K})}\geq\|\psi_{j_{0}}\|_{C(% \mathcal{K})}=3\epsilon∥ italic_f start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≥ ∥ italic_ψ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT = 3 italic_ϵ.

Thus, we have shown that there exist 2N=|{0,1}N|superscript2𝑁superscript01𝑁2^{N}=|\{0,1\}^{N}|2 start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT = | { 0 , 1 } start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT | functions fσLip1(𝒦)subscript𝑓𝜎subscriptLip1𝒦f_{\sigma}\in\mathrm{Lip}_{1}(\mathcal{K})italic_f start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ), with pairwise C(𝒦)𝐶𝒦C(\mathcal{K})italic_C ( caligraphic_K )-distance 3ϵabsent3italic-ϵ\geq 3\epsilon≥ 3 italic_ϵ. In particular, this implies that the packing number (Lip1(𝒦);3ϵ)C(𝒦)2NsubscriptsubscriptLip1𝒦3italic-ϵ𝐶𝒦superscript2𝑁\mathcal{M}(\mathrm{Lip}_{1}(\mathcal{K});3\epsilon)_{C(\mathcal{K})}\geq 2^{N}caligraphic_M ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) ; 3 italic_ϵ ) start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≥ 2 start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, and by the inequality (3.1) between packing- and covering-numbers, this now implies that

𝒩(Lip1(𝒦);ϵ)C(𝒦)(Lip1(𝒦);3ϵ)C(𝒦)2N.𝒩subscriptsubscriptLip1𝒦italic-ϵ𝐶𝒦subscriptsubscriptLip1𝒦3italic-ϵ𝐶𝒦superscript2𝑁\mathcal{N}(\mathrm{Lip}_{1}(\mathcal{K});\epsilon)_{C(\mathcal{K})}\geq% \mathcal{M}(\mathrm{Lip}_{1}(\mathcal{K});3\epsilon)_{C(\mathcal{K})}\geq 2^{N}.caligraphic_N ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) ; italic_ϵ ) start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≥ caligraphic_M ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) ; 3 italic_ϵ ) start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≥ 2 start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT .

The claim follows by taking logarithms and recalling that N=𝒩(𝒦;6ϵ)=2(𝒦;6ϵ)𝒳𝑁𝒩𝒦6italic-ϵsuperscript2subscript𝒦6italic-ϵ𝒳N=\mathcal{N}(\mathcal{K};6\epsilon)=2^{\mathcal{H}(\mathcal{K};6\epsilon)_{% \mathcal{X}}}italic_N = caligraphic_N ( caligraphic_K ; 6 italic_ϵ ) = 2 start_POSTSUPERSCRIPT caligraphic_H ( caligraphic_K ; 6 italic_ϵ ) start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. ∎

We conclude this section with several corollaries of Proposition 3.2.

Corollary 3.3 (Lipschitz functions on finite-dimensional domains).

If Dd𝐷superscript𝑑D\subset\mathbb{R}^{d}italic_D ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is a compact domain in Euclidean space, then

(Lip1(D);ϵ)ϵd.greater-than-or-equivalent-tosubscriptLip1𝐷italic-ϵsuperscriptitalic-ϵ𝑑\mathcal{H}(\mathrm{Lip}_{1}(D);\epsilon)\gtrsim\epsilon^{-d}.caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_D ) ; italic_ϵ ) ≳ italic_ϵ start_POSTSUPERSCRIPT - italic_d end_POSTSUPERSCRIPT .
Proof.

It is a well-known fact that

𝒩(D;ϵ)ϵd,greater-than-or-equivalent-to𝒩𝐷italic-ϵsuperscriptitalic-ϵ𝑑\mathcal{N}(D;\epsilon)\gtrsim\epsilon^{-d},caligraphic_N ( italic_D ; italic_ϵ ) ≳ italic_ϵ start_POSTSUPERSCRIPT - italic_d end_POSTSUPERSCRIPT ,

with an implied constant depending on the dimension d𝑑ditalic_d and the volume of D𝐷Ditalic_D; for example, this can be a simple volume argument for an ϵitalic-ϵ\epsilonitalic_ϵ-covering Dn=1NBϵ¯(xn)𝐷superscriptsubscript𝑛1𝑁¯subscript𝐵italic-ϵsubscript𝑥𝑛D\subset\bigcup_{n=1}^{N}\overline{B_{\epsilon}}(x_{n})italic_D ⊂ ⋃ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT over¯ start_ARG italic_B start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_ARG ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), which yields

vol(D)vol(n=1NBϵ¯(xn))Nvol(Bϵ¯)=NCdϵdNvol(D)Cdϵd.vol𝐷volsuperscriptsubscript𝑛1𝑁¯subscript𝐵italic-ϵsubscript𝑥𝑛𝑁vol¯subscript𝐵italic-ϵ𝑁subscript𝐶𝑑superscriptitalic-ϵ𝑑𝑁vol𝐷subscript𝐶𝑑superscriptitalic-ϵ𝑑\mathrm{vol}(D)\leq\mathrm{vol}\left(\bigcup_{n=1}^{N}\overline{B_{\epsilon}}(% x_{n})\right)\leq N\mathrm{vol}(\overline{B_{\epsilon}})=NC_{d}\epsilon^{d}\;% \Rightarrow\;N\geq\frac{\mathrm{vol}(D)}{C_{d}\epsilon^{d}}.roman_vol ( italic_D ) ≤ roman_vol ( ⋃ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT over¯ start_ARG italic_B start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_ARG ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) ≤ italic_N roman_vol ( over¯ start_ARG italic_B start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_ARG ) = italic_N italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_ϵ start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ⇒ italic_N ≥ divide start_ARG roman_vol ( italic_D ) end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_ϵ start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG .

The claim thus follows from Proposition 3.2. ∎

Corollary 3.4 (Lipschitz functionals on Sobolev spaces).

Let Dd𝐷superscript𝑑D\subset\mathbb{R}^{d}italic_D ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be a compact domain in Euclidean space. Let 𝒦=𝒰(Ws,p(D))𝒦𝒰superscript𝑊𝑠𝑝𝐷\mathcal{K}=\mathcal{U}(W^{s,p}(D))caligraphic_K = caligraphic_U ( italic_W start_POSTSUPERSCRIPT italic_s , italic_p end_POSTSUPERSCRIPT ( italic_D ) ) be the unit ball in the space of Sobolev functions possessing s>0𝑠0s>0italic_s > 0 weak derivatives in Lp(D)superscript𝐿𝑝𝐷L^{p}(D)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_D ), considered as a subset of Lp(D)superscript𝐿𝑝𝐷L^{p}(D)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_D ). Then there exists a constant c>0𝑐0c>0italic_c > 0, such that

(Lip1(𝒦);ϵ)exp(cϵd/s).greater-than-or-equivalent-tosubscriptLip1𝒦italic-ϵ𝑐superscriptitalic-ϵ𝑑𝑠\mathcal{H}(\mathrm{Lip}_{1}(\mathcal{K});\epsilon)\gtrsim\exp(c\epsilon^{-d/s% }).caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) ; italic_ϵ ) ≳ roman_exp ( italic_c italic_ϵ start_POSTSUPERSCRIPT - italic_d / italic_s end_POSTSUPERSCRIPT ) .
Proof.

The metric entropy of 𝒰(Ws,p(D))𝒰superscript𝑊𝑠𝑝𝐷\mathcal{U}(W^{s,p}(D))caligraphic_U ( italic_W start_POSTSUPERSCRIPT italic_s , italic_p end_POSTSUPERSCRIPT ( italic_D ) ) with respect to the Lpsuperscript𝐿𝑝L^{p}italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT-norm is lower bounded by [7]:

(𝒦;ϵ)Lp(D)ϵd/s.greater-than-or-equivalent-tosubscript𝒦italic-ϵsuperscript𝐿𝑝𝐷superscriptitalic-ϵ𝑑𝑠\mathcal{H}(\mathcal{K};\epsilon)_{L^{p}(D)}\gtrsim\epsilon^{-d/s}.caligraphic_H ( caligraphic_K ; italic_ϵ ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_D ) end_POSTSUBSCRIPT ≳ italic_ϵ start_POSTSUPERSCRIPT - italic_d / italic_s end_POSTSUPERSCRIPT .

The claim thus follows from Proposition 3.2. ∎

Corollary 3.5 (Lipschitz functionals on Hölder spaces).

Let Dd𝐷superscript𝑑D\subset\mathbb{R}^{d}italic_D ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be a compact domain in Euclidean space. Let 𝒦=𝒰(Cs(D))𝒦𝒰superscript𝐶𝑠𝐷\mathcal{K}=\mathcal{U}(C^{s}(D))caligraphic_K = caligraphic_U ( italic_C start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( italic_D ) ) be the unit ball in the space of Hölder continuous functions of order s>0𝑠0s>0italic_s > 0, considered as a subset of C(D)𝐶𝐷C(D)italic_C ( italic_D ). Then there exists a constant c>0𝑐0c>0italic_c > 0, such that

(Lip1(𝒦);ϵ)exp(cϵd/s).greater-than-or-equivalent-tosubscriptLip1𝒦italic-ϵ𝑐superscriptitalic-ϵ𝑑𝑠\mathcal{H}(\mathrm{Lip}_{1}(\mathcal{K});\epsilon)\gtrsim\exp(c\epsilon^{-d/s% }).caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_K ) ; italic_ϵ ) ≳ roman_exp ( italic_c italic_ϵ start_POSTSUPERSCRIPT - italic_d / italic_s end_POSTSUPERSCRIPT ) .
Proof.

The metric entropy of 𝒰(Cs(D))𝒰superscript𝐶𝑠𝐷\mathcal{U}(C^{s}(D))caligraphic_U ( italic_C start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( italic_D ) ) with respect to the sup-norm is lower bounded by [26]:

(𝒦;ϵ)C(D)ϵd/s.greater-than-or-equivalent-tosubscript𝒦italic-ϵ𝐶𝐷superscriptitalic-ϵ𝑑𝑠\mathcal{H}(\mathcal{K};\epsilon)_{C(D)}\gtrsim\epsilon^{-d/s}.caligraphic_H ( caligraphic_K ; italic_ϵ ) start_POSTSUBSCRIPT italic_C ( italic_D ) end_POSTSUBSCRIPT ≳ italic_ϵ start_POSTSUPERSCRIPT - italic_d / italic_s end_POSTSUPERSCRIPT .

The claim thus follows from Proposition 3.2. ∎

3.3. Approximation in expectation

Besides the setting discussed in the previous section, which is relevant for the uniform approximation of operators over a compact set of input functions, another commonly studied setting is the approximation in expectation (cp. Setting 2.4): Here, we consider 1111-Lipschitz map**s 𝒢:𝒳:𝒢𝒳\mathcal{G}:\mathcal{X}\to\mathbb{R}caligraphic_G : caligraphic_X → blackboard_R defined on a separable Hilbert space 𝒳𝒳\mathcal{X}caligraphic_X. We fix a probability measure μ𝜇\muitalic_μ on 𝒳𝒳\mathcal{X}caligraphic_X and consider inputs as random draws uμsimilar-to𝑢𝜇u\sim\muitalic_u ∼ italic_μ. We assume that μ𝜇\muitalic_μ satisfies the minimal structural Assumption 2.10; under this assumption, random draws uμsimilar-to𝑢𝜇u\sim\muitalic_u ∼ italic_μ can be obtained from a Karhunen-Loeve-like expansion, u=j=1λjZjej𝑢superscriptsubscript𝑗1subscript𝜆𝑗subscript𝑍𝑗subscript𝑒𝑗u=\sum_{j=1}^{\infty}\sqrt{\lambda_{j}}Z_{j}e_{j}italic_u = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

Our aim is to find lower bounds on the metric entropy of Lip1(𝒳)𝗩subscriptLip1𝒳𝗩\mathrm{Lip}_{1}(\mathcal{X})\subset\mathsf{\bm{V}}roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) ⊂ bold_sansserif_V, where 𝗩=Lp(μ)𝗩superscript𝐿𝑝𝜇\mathsf{\bm{V}}=L^{p}(\mu)bold_sansserif_V = italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) is the space of Lp(μ)superscript𝐿𝑝𝜇L^{p}(\mu)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ )-integrable operators. The following entropy estimate represents the main novel contribution of this section:

Proposition 3.6.

Let 𝒳𝒳\mathcal{X}caligraphic_X be a separable Hilbert space, and let μ𝜇\muitalic_μ be a probability measure satisfying Assumption 2.10. Let p[1,)𝑝1p\in[1,\infty)italic_p ∈ [ 1 , ∞ ) be given. Assume that the coefficients λjjαgreater-than-or-equivalent-tosubscript𝜆𝑗superscript𝑗𝛼\sqrt{\lambda_{j}}\gtrsim j^{-\alpha}square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ≳ italic_j start_POSTSUPERSCRIPT - italic_α end_POSTSUPERSCRIPT as j𝑗j\to\inftyitalic_j → ∞, where α>0𝛼0\alpha>0italic_α > 0. Then the metric entropy of Lip1(𝒳)subscriptLip1𝒳\mathrm{Lip}_{1}(\mathcal{X})roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) with respect to the Bochner Lp(μ)superscript𝐿𝑝𝜇L^{p}(\mu)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ )-norm, obeys the following lower bound: There exist constants c,ϵ0>0𝑐subscriptitalic-ϵ00c,\epsilon_{0}>0italic_c , italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0, such that

(3.3) (Lip1(𝒳);ϵ)Lp(μ)exp(cϵ1/(α+1)),ϵ(0,ϵ0].formulae-sequencesubscriptsubscriptLip1𝒳italic-ϵsuperscript𝐿𝑝𝜇𝑐superscriptitalic-ϵ1𝛼1for-allitalic-ϵ0subscriptitalic-ϵ0\displaystyle\mathcal{H}(\mathrm{Lip}_{1}(\mathcal{X});\epsilon)_{L^{p}(\mu)}% \geq\exp\left(c\epsilon^{-1/(\alpha+1)}\right),\quad\forall\,\epsilon\in(0,% \epsilon_{0}].caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) ; italic_ϵ ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT ≥ roman_exp ( italic_c italic_ϵ start_POSTSUPERSCRIPT - 1 / ( italic_α + 1 ) end_POSTSUPERSCRIPT ) , ∀ italic_ϵ ∈ ( 0 , italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] .

Our proof of Proposition 3.6 will rely on several technical lemmas, which we state and prove below. The first lemma identifies an isometric embedding Lp([0,1]d)Lp(μ)superscript𝐿𝑝superscript01𝑑superscript𝐿𝑝𝜇L^{p}([0,1]^{d}){\hookrightarrow}L^{p}(\mu)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ↪ italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ).

Lemma 3.7.

Let 𝒳𝒳\mathcal{X}caligraphic_X be a separable Hilbert space. Let μ𝒫(𝒳)𝜇𝒫𝒳\mu\in\mathcal{P}(\mathcal{X})italic_μ ∈ caligraphic_P ( caligraphic_X ) satisfy Assumption 2.10, and let p[1,)𝑝1p\in[1,\infty)italic_p ∈ [ 1 , ∞ ). Then for any d𝑑d\in\mathbb{N}italic_d ∈ blackboard_N, there exists an isometric embedding,

(3.4) ιd:Lp([0,1]d)Lp(μ),:subscript𝜄𝑑superscript𝐿𝑝superscript01𝑑superscript𝐿𝑝𝜇\displaystyle\iota_{d}:L^{p}([0,1]^{d}){\hookrightarrow}L^{p}(\mu),italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT : italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ↪ italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) ,

such that ιd(Lip1([0,1]d))LipL/λd(𝒳)subscript𝜄𝑑subscriptLip1superscript01𝑑subscriptLip𝐿subscript𝜆𝑑𝒳\iota_{d}(\mathrm{Lip}_{1}([0,1]^{d}))\subset\mathrm{Lip}_{L/\sqrt{\lambda_{d}% }}(\mathcal{X})italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ) ⊂ roman_Lip start_POSTSUBSCRIPT italic_L / square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT ( caligraphic_X ), where the Lipschitz norm on [0,1]dsuperscript01𝑑[0,1]^{d}[ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is defined with respect to the superscript\ell^{\infty}roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT-norm on [0,1]dsuperscript01𝑑[0,1]^{d}[ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT.

Proof.

By assumption, μ𝒫(𝒳)𝜇𝒫𝒳\mu\in\mathcal{P}(\mathcal{X})italic_μ ∈ caligraphic_P ( caligraphic_X ) is the law of a random field u:Ω𝒳:𝑢Ω𝒳u:\Omega\to\mathcal{X}italic_u : roman_Ω → caligraphic_X of the form,

(3.5) u(ω)=j=1λjZj(ω)ej,𝑢𝜔superscriptsubscript𝑗1subscript𝜆𝑗subscript𝑍𝑗𝜔subscript𝑒𝑗\displaystyle u(\omega)=\sum_{j=1}^{\infty}\sqrt{\lambda_{j}}Z_{j}(\omega)e_{j},italic_u ( italic_ω ) = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_ω ) italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ,

with Zjsubscript𝑍𝑗Z_{j}italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT independent, Zjρj(z)dzsimilar-tosubscript𝑍𝑗subscript𝜌𝑗𝑧𝑑𝑧Z_{j}\sim\rho_{j}(z)\,dzitalic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∼ italic_ρ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_z ) italic_d italic_z. To construct the claimed isometry, we define Fj(z):=ρj(ζ)𝑑ζassignsubscript𝐹𝑗𝑧subscriptsubscript𝜌𝑗𝜁differential-d𝜁F_{j}(z):=\int_{-\infty}\rho_{j}(\zeta)\,d\zetaitalic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_z ) := ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_ζ ) italic_d italic_ζ as the cumulative distribution function of ρjsubscript𝜌𝑗\rho_{j}italic_ρ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. We recall that Fj(Zj)𝒰(0,1)similar-tosubscript𝐹𝑗subscript𝑍𝑗𝒰01F_{j}(Z_{j})\sim\mathcal{U}(0,1)italic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∼ caligraphic_U ( 0 , 1 ) is uniform [0,1]01[0,1][ 0 , 1 ] distributed. Furthermore, we clearly have Lip(Fj)=ρjL()LLipsubscript𝐹𝑗subscriptnormsubscript𝜌𝑗superscript𝐿𝐿\mathrm{Lip}(F_{j})=\|\rho_{j}\|_{L^{\infty}(\mathbb{R})}\leq Lroman_Lip ( italic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = ∥ italic_ρ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( blackboard_R ) end_POSTSUBSCRIPT ≤ italic_L, where the last bound is by Assumption 2.10.

Given u𝒳𝑢𝒳u\in\mathcal{X}italic_u ∈ caligraphic_X, we define uj:=ej,u𝒳assignsubscript𝑢𝑗subscriptsubscript𝑒𝑗𝑢𝒳u_{j}:=\langle e_{j},u\rangle_{\mathcal{X}}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT := ⟨ italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_u ⟩ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT the coefficients of u𝑢uitalic_u with respect to the orthonormal basis {ej}subscript𝑒𝑗\{e_{j}\}{ italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }. Using the CDFs introduced above, Fj:[0,1]:subscript𝐹𝑗01F_{j}:\mathbb{R}\to[0,1]italic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : blackboard_R → [ 0 , 1 ], we now define a map**,

ιd:Lp([0,1]d)Lp(μ),(ιdf)(u):=f(F1(u1/λ1),,Fd(ud/λd)).:subscript𝜄𝑑formulae-sequencesuperscript𝐿𝑝superscript01𝑑superscript𝐿𝑝𝜇assignsubscript𝜄𝑑𝑓𝑢𝑓subscript𝐹1subscript𝑢1subscript𝜆1subscript𝐹𝑑subscript𝑢𝑑subscript𝜆𝑑\iota_{d}:L^{p}([0,1]^{d})\to L^{p}(\mu),\quad(\iota_{d}f)(u):=f(F_{1}(u_{1}/% \sqrt{\lambda_{1}}),\dots,F_{d}(u_{d}/\sqrt{\lambda_{d}})).italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT : italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) → italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) , ( italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_f ) ( italic_u ) := italic_f ( italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / square-root start_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ) , … , italic_F start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT / square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG ) ) .

To see that this is well-defined, we note that, using the expansion of the random field (3.5), uj/λj=Zjsubscript𝑢𝑗subscript𝜆𝑗subscript𝑍𝑗u_{j}/\sqrt{\lambda_{j}}=Z_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT / square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG = italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and hence

(ιdf)(u)=f(F1(Z1),,Fd(Zd)),for uμ,formulae-sequencesubscript𝜄𝑑𝑓𝑢𝑓subscript𝐹1subscript𝑍1subscript𝐹𝑑subscript𝑍𝑑similar-tofor 𝑢𝜇(\iota_{d}f)(u)=f(F_{1}(Z_{1}),\dots,F_{d}(Z_{d})),\quad\text{for }u\sim\mu,( italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_f ) ( italic_u ) = italic_f ( italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_F start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ) , for italic_u ∼ italic_μ ,

and we once again remind ourselves that Fj(Zj)𝒰(0,1)similar-tosubscript𝐹𝑗subscript𝑍𝑗𝒰01F_{j}(Z_{j})\sim\mathcal{U}(0,1)italic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∼ caligraphic_U ( 0 , 1 ) is uniformly distributed on [0,1]01[0,1][ 0 , 1 ], and that the Zjsubscript𝑍𝑗Z_{j}italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are independent by assumption. Thus, it follows that

𝔼uμ|(ιdf)(u)|psubscript𝔼similar-to𝑢𝜇superscriptsubscript𝜄𝑑𝑓𝑢𝑝\displaystyle\mathbb{E}_{u\sim\mu}|(\iota_{d}f)(u)|^{p}blackboard_E start_POSTSUBSCRIPT italic_u ∼ italic_μ end_POSTSUBSCRIPT | ( italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_f ) ( italic_u ) | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT =𝔼|f(F1(Z1),,Fd(Zd))|pabsent𝔼superscript𝑓subscript𝐹1subscript𝑍1subscript𝐹𝑑subscript𝑍𝑑𝑝\displaystyle=\mathbb{E}|f(F_{1}(Z_{1}),\dots,F_{d}(Z_{d}))|^{p}= blackboard_E | italic_f ( italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_F start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ) | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT
=[0,1]d|f(x1,,xd)|p𝑑xabsentsubscriptsuperscript01𝑑superscript𝑓subscript𝑥1subscript𝑥𝑑𝑝differential-d𝑥\displaystyle=\int_{[0,1]^{d}}|f(x_{1},\dots,x_{d})|^{p}\,dx= ∫ start_POSTSUBSCRIPT [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_f ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_d italic_x
=fLp([0,1]d)p.absentsuperscriptsubscriptnorm𝑓superscript𝐿𝑝superscript01𝑑𝑝\displaystyle=\|f\|_{L^{p}([0,1]^{d})}^{p}.= ∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT .

Thus, ιdfLp(μ)=fLp([0,1]d)subscriptnormsubscript𝜄𝑑𝑓superscript𝐿𝑝𝜇subscriptnorm𝑓superscript𝐿𝑝superscript01𝑑\|\iota_{d}f\|_{L^{p}(\mu)}=\|f\|_{L^{p}([0,1]^{d})}∥ italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT = ∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT. This shows that ιd:Lp([0,1]d)Lp(μ):subscript𝜄𝑑superscript𝐿𝑝superscript01𝑑superscript𝐿𝑝𝜇\iota_{d}:L^{p}([0,1]^{d})\to L^{p}(\mu)italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT : italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) → italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) is an isometry as claimed. To verify that ιd(Lip1([0,1]d))LipL/λd(𝒳)subscript𝜄𝑑subscriptLip1superscript01𝑑subscriptLip𝐿subscript𝜆𝑑𝒳\iota_{d}(\mathrm{Lip}_{1}([0,1]^{d}))\subset\mathrm{Lip}_{L/\sqrt{\lambda_{d}% }}(\mathcal{X})italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ) ⊂ roman_Lip start_POSTSUBSCRIPT italic_L / square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT ( caligraphic_X ), we note that

hd:(𝒳,𝒳)([0,1]d,),u(F1(u1/λ1),,Fd(ud/λd)),h_{d}:(\mathcal{X},\|{\,\cdot\,}\|_{\mathcal{X}})\to([0,1]^{d},\ell^{\infty}),% \quad u\mapsto(F_{1}(u_{1}/\sqrt{\lambda_{1}}),\dots,F_{d}(u_{d}/\sqrt{\lambda% _{d}})),italic_h start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT : ( caligraphic_X , ∥ ⋅ ∥ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ) → ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ) , italic_u ↦ ( italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / square-root start_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ) , … , italic_F start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT / square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG ) ) ,

has Lipschitz constant bounded by

Lip(hd)maxj=1,,dLip(Fj)λjLλd.Lipsubscript𝑑subscript𝑗1𝑑Lipsubscript𝐹𝑗subscript𝜆𝑗𝐿subscript𝜆𝑑\mathrm{Lip}(h_{d})\leq\max_{j=1,\dots,d}\frac{\mathrm{Lip}(F_{j})}{\sqrt{% \lambda_{j}}}\leq\frac{L}{\sqrt{\lambda_{d}}}.roman_Lip ( italic_h start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ≤ roman_max start_POSTSUBSCRIPT italic_j = 1 , … , italic_d end_POSTSUBSCRIPT divide start_ARG roman_Lip ( italic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG end_ARG ≤ divide start_ARG italic_L end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG end_ARG .

Thus, for any fLip1([0,1]d)=Lip1(([0,1]d,))𝑓subscriptLip1superscript01𝑑subscriptLip1superscript01𝑑superscriptf\in\mathrm{Lip}_{1}([0,1]^{d})=\mathrm{Lip}_{1}(([0,1]^{d},\ell^{\infty}))italic_f ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) = roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ) ),

Lip(ιdf)=Lip(fhd)Lip(f)Lip(hd)Lλd.Lipsubscript𝜄𝑑𝑓Lip𝑓subscript𝑑Lip𝑓Lipsubscript𝑑𝐿subscript𝜆𝑑\mathrm{Lip}(\iota_{d}f)=\mathrm{Lip}(f\circ h_{d})\leq\mathrm{Lip}(f)\mathrm{% Lip}(h_{d})\leq\frac{L}{\sqrt{\lambda_{d}}}.roman_Lip ( italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_f ) = roman_Lip ( italic_f ∘ italic_h start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ≤ roman_Lip ( italic_f ) roman_Lip ( italic_h start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ≤ divide start_ARG italic_L end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG end_ARG .

Furthermore, we also have ιdfC(𝒳)fC([0,1]d)1subscriptnormsubscript𝜄𝑑𝑓𝐶𝒳subscriptnorm𝑓𝐶superscript01𝑑1\|\iota_{d}f\|_{C(\mathcal{X})}\leq\|f\|_{C([0,1]^{d})}\leq 1∥ italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_f ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_X ) end_POSTSUBSCRIPT ≤ ∥ italic_f ∥ start_POSTSUBSCRIPT italic_C ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ≤ 1. This shows that

ιdfLip=max{ιdfC(𝒳),Lip(ιdf)}max{1,Lλd}=Lλd.subscriptnormsubscript𝜄𝑑𝑓Lipsubscriptnormsubscript𝜄𝑑𝑓𝐶𝒳Lipsubscript𝜄𝑑𝑓1𝐿subscript𝜆𝑑𝐿subscript𝜆𝑑\|\iota_{d}f\|_{\mathrm{Lip}}=\max\left\{\|\iota_{d}f\|_{C(\mathcal{X})},% \mathrm{Lip}(\iota_{d}f)\right\}\leq\max\left\{1,\frac{L}{\sqrt{\lambda_{d}}}% \right\}=\frac{L}{\sqrt{\lambda_{d}}}.∥ italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_f ∥ start_POSTSUBSCRIPT roman_Lip end_POSTSUBSCRIPT = roman_max { ∥ italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_f ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_X ) end_POSTSUBSCRIPT , roman_Lip ( italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_f ) } ≤ roman_max { 1 , divide start_ARG italic_L end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG end_ARG } = divide start_ARG italic_L end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG end_ARG .

Here, we have made use of the choice L>λ1λd𝐿subscript𝜆1subscript𝜆𝑑L>\sqrt{\lambda_{1}}\geq\sqrt{\lambda_{d}}italic_L > square-root start_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ≥ square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG (cp. (2.20)) in the last inequality. This concludes our proof. ∎

As a consequence of Lemma 3.7, we have:

Corollary 3.8.

Under the assumptions of Lemma 3.7, we have

(3.6) (Lip1(𝒳);ϵ)Lp(μ)(Lip1([0,1]d);Lϵλd)Lp([0,1]d),subscriptsubscriptLip1𝒳italic-ϵsuperscript𝐿𝑝𝜇subscriptsubscriptLip1superscript01𝑑𝐿italic-ϵsubscript𝜆𝑑superscript𝐿𝑝superscript01𝑑\displaystyle\mathcal{H}(\mathrm{Lip}_{1}(\mathcal{X});\epsilon)_{L^{p}(\mu)}% \geq\mathcal{H}\left(\mathrm{Lip}_{1}([0,1]^{d});\frac{L\epsilon}{\sqrt{% \lambda_{d}}}\right)_{L^{p}([0,1]^{d})},caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) ; italic_ϵ ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT ≥ caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ; divide start_ARG italic_L italic_ϵ end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG end_ARG ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ,

for any d𝑑d\in\mathbb{N}italic_d ∈ blackboard_N.

Proof.

We recall the existence of an isometric embedding ιd:Lp([0,1]d)Lp(μ):subscript𝜄𝑑superscript𝐿𝑝superscript01𝑑superscript𝐿𝑝𝜇\iota_{d}:L^{p}([0,1]^{d})\to L^{p}(\mu)italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT : italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) → italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) from Lemma 3.7, with ιd(Lip1(𝒳))LipL/λd([0,1]d)subscript𝜄𝑑subscriptLip1𝒳subscriptLip𝐿subscript𝜆𝑑superscript01𝑑\iota_{d}(\mathrm{Lip}_{1}(\mathcal{X}))\subset\mathrm{Lip}_{L/\sqrt{\lambda_{% d}}}([0,1]^{d})italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) ) ⊂ roman_Lip start_POSTSUBSCRIPT italic_L / square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ). It follows that

𝒩(Lip1(𝒳);ϵ)Lp(μ)𝒩subscriptsubscriptLip1𝒳italic-ϵsuperscript𝐿𝑝𝜇\displaystyle\mathcal{N}(\mathrm{Lip}_{1}(\mathcal{X});\epsilon)_{L^{p}(\mu)}caligraphic_N ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) ; italic_ϵ ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT =𝒩(LipL/λd(𝒳);Lϵ/λd)Lp(μ)absent𝒩subscriptsubscriptLip𝐿subscript𝜆𝑑𝒳𝐿italic-ϵsubscript𝜆𝑑superscript𝐿𝑝𝜇\displaystyle=\mathcal{N}(\mathrm{Lip}_{L/\sqrt{\lambda_{d}}}(\mathcal{X});L% \epsilon/\sqrt{\lambda_{d}})_{L^{p}(\mu)}= caligraphic_N ( roman_Lip start_POSTSUBSCRIPT italic_L / square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT ( caligraphic_X ) ; italic_L italic_ϵ / square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT
𝒩(ιd(Lip1([0,1]d));Lϵ/λd)Lp(μ)absent𝒩subscriptsubscript𝜄𝑑subscriptLip1superscript01𝑑𝐿italic-ϵsubscript𝜆𝑑superscript𝐿𝑝𝜇\displaystyle\geq\mathcal{N}(\iota_{d}(\mathrm{Lip}_{1}([0,1]^{d}));L\epsilon/% \sqrt{\lambda_{d}})_{L^{p}(\mu)}≥ caligraphic_N ( italic_ι start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ) ; italic_L italic_ϵ / square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT
=𝒩(Lip1([0,1]d);Lϵ/λd)Lp([0,1]d).absent𝒩subscriptsubscriptLip1superscript01𝑑𝐿italic-ϵsubscript𝜆𝑑superscript𝐿𝑝superscript01𝑑\displaystyle=\mathcal{N}(\mathrm{Lip}_{1}([0,1]^{d});L\epsilon/\sqrt{\lambda_% {d}})_{L^{p}([0,1]^{d})}.= caligraphic_N ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ; italic_L italic_ϵ / square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT .

Taking logarithms, the claimed inequality between the metric entropy follows. ∎

The proof of Proposition 3.6 will furthermore make use of the following result in the finite-dimensional setting:

Lemma 3.9.

Let p[1,)𝑝1p\in[1,\infty)italic_p ∈ [ 1 , ∞ ) be given. For d𝑑d\in\mathbb{N}italic_d ∈ blackboard_N, consider Lip1([0,1]d)Lp([0,1]d)subscriptLip1superscript01𝑑superscript𝐿𝑝superscript01𝑑\mathrm{Lip}_{1}([0,1]^{d})\subset L^{p}([0,1]^{d})roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ⊂ italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ). Then there exists a constant c>0𝑐0c>0italic_c > 0, independent of d𝑑ditalic_d, such that we have the following lower bound on the metric entropy:

(3.7) (Lip1([0,1]d);ϵ)Lp([0,1]d)18(cdϵ)d,ϵ(0,cd].formulae-sequencesubscriptsubscriptLip1superscript01𝑑italic-ϵsuperscript𝐿𝑝superscript01𝑑18superscript𝑐𝑑italic-ϵ𝑑for-allitalic-ϵ0𝑐𝑑\displaystyle\mathcal{H}(\mathrm{Lip}_{1}([0,1]^{d});\epsilon)_{L^{p}([0,1]^{d% })}\geq\frac{1}{8}\left(\frac{c}{d\epsilon}\right)^{d},\quad\forall\,\epsilon% \in\left(0,\frac{c}{d}\right].caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ; italic_ϵ ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG 8 end_ARG ( divide start_ARG italic_c end_ARG start_ARG italic_d italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∀ italic_ϵ ∈ ( 0 , divide start_ARG italic_c end_ARG start_ARG italic_d end_ARG ] .
Proof.

Since the Hölder inequality implies, for any p[1,)𝑝1p\in[1,\infty)italic_p ∈ [ 1 , ∞ ), that fL1([0,1]d)fLp([0,1]d)subscriptnorm𝑓superscript𝐿1superscript01𝑑subscriptnorm𝑓superscript𝐿𝑝superscript01𝑑\|f\|_{L^{1}([0,1]^{d})}\leq\|f\|_{L^{p}([0,1]^{d})}∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ≤ ∥ italic_f ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT, it follows that any covering of Lip1([0,1]d)subscriptLip1superscript01𝑑\mathrm{Lip}_{1}([0,1]^{d})roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) by ϵitalic-ϵ\epsilonitalic_ϵ-balls with respect to the Lpsuperscript𝐿𝑝L^{p}italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT-norm, also gives rise to a covering of Lip1([0,1]d)subscriptLip1superscript01𝑑\mathrm{Lip}_{1}([0,1]^{d})roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) with respect to the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT-norm (with the same centers). In particular, this implies that

𝒩(Lip1([0,1]d);ϵ)Lp([0,1]d)𝒩(Lip1([0,1]d);ϵ)L1([0,1]d),𝒩subscriptsubscriptLip1superscript01𝑑italic-ϵsuperscript𝐿𝑝superscript01𝑑𝒩subscriptsubscriptLip1superscript01𝑑italic-ϵsuperscript𝐿1superscript01𝑑\mathcal{N}(\mathrm{Lip}_{1}([0,1]^{d});\epsilon)_{L^{p}([0,1]^{d})}\geq% \mathcal{N}(\mathrm{Lip}_{1}([0,1]^{d});\epsilon)_{L^{1}([0,1]^{d})},caligraphic_N ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ; italic_ϵ ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ≥ caligraphic_N ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ; italic_ϵ ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ,

and we only need to establish (3.7) for p=1𝑝1p=1italic_p = 1.

For λ(0,1)𝜆01\lambda\in(0,1)italic_λ ∈ ( 0 , 1 ), define ϕλ:[0,1]d+:subscriptitalic-ϕ𝜆superscript01𝑑subscript\phi_{\lambda}:[0,1]^{d}\to\mathbb{R}_{+}italic_ϕ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT : [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT as a composition gλg_{\lambda}\circ\|{\,\cdot\,}\|_{\ell^{\infty}}italic_g start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ∘ ∥ ⋅ ∥ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, where gλ::subscript𝑔𝜆g_{\lambda}:\mathbb{R}\to\mathbb{R}italic_g start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT : blackboard_R → blackboard_R is a piecewise linear function (approximately gλ1[0,1]subscript𝑔𝜆subscript101g_{\lambda}\approx 1_{[0,1]}italic_g start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ≈ 1 start_POSTSUBSCRIPT [ 0 , 1 ] end_POSTSUBSCRIPT) with values,

gλ(x):={0,(x[0,1]),1,(x[λ/2,1λ/2]),assignsubscript𝑔𝜆𝑥cases0𝑥011𝑥𝜆21𝜆2g_{\lambda}(x):=\begin{cases}0,&(x\notin[0,1]),\\ 1,&(x\in[\lambda/2,1-\lambda/2]),\end{cases}italic_g start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_x ) := { start_ROW start_CELL 0 , end_CELL start_CELL ( italic_x ∉ [ 0 , 1 ] ) , end_CELL end_ROW start_ROW start_CELL 1 , end_CELL start_CELL ( italic_x ∈ [ italic_λ / 2 , 1 - italic_λ / 2 ] ) , end_CELL end_ROW

and gλsubscript𝑔𝜆g_{\lambda}italic_g start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT interpolates linearly between 00 and 1111 on [0,λ/2]0𝜆2[0,\lambda/2][ 0 , italic_λ / 2 ], and from 1111 to 00 on [1λ/2,1]1𝜆21[1-\lambda/2,1][ 1 - italic_λ / 2 , 1 ]. By construction, gλsubscript𝑔𝜆g_{\lambda}italic_g start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT is 2/λ2𝜆2/\lambda2 / italic_λ-Lipschitz. Since xxmaps-to𝑥subscriptnorm𝑥superscriptx\mapsto\|x\|_{\ell^{\infty}}italic_x ↦ ∥ italic_x ∥ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is 1111-Lipschitz, it follows that Lip(ϕλ)=Lip(gλ)2/λ\mathrm{Lip}(\phi_{\lambda})=\mathrm{Lip}(g_{\lambda}\circ\|{\,\cdot\,}\|_{% \ell^{\infty}})\leq 2/\lambdaroman_Lip ( italic_ϕ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ) = roman_Lip ( italic_g start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ∘ ∥ ⋅ ∥ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ≤ 2 / italic_λ. Clearly, smaller λ𝜆\lambdaitalic_λ leads to a larger Lipschitz constant. However, by construction of ϕλsubscriptitalic-ϕ𝜆\phi_{\lambda}italic_ϕ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT, we have ϕλ1[λ/2,1λ/2]dsubscriptitalic-ϕ𝜆subscript1superscript𝜆21𝜆2𝑑\phi_{\lambda}\geq 1_{[\lambda/2,1-\lambda/2]^{d}}italic_ϕ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ≥ 1 start_POSTSUBSCRIPT [ italic_λ / 2 , 1 - italic_λ / 2 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. In particular, this implies that ϕλL1(1λ)dsubscriptnormsubscriptitalic-ϕ𝜆superscript𝐿1superscript1𝜆𝑑\|\phi_{\lambda}\|_{L^{1}}\geq(1-\lambda)^{d}∥ italic_ϕ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≥ ( 1 - italic_λ ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Thus, smaller λ𝜆\lambdaitalic_λ increases the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT-norm of ϕλsubscriptitalic-ϕ𝜆\phi_{\lambda}italic_ϕ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT.

Given N𝑁N\in\mathbb{N}italic_N ∈ blackboard_N, we now subdivide [0,1]dsuperscript01𝑑[0,1]^{d}[ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT into Ndsuperscript𝑁𝑑N^{d}italic_N start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT cubes of equal length, indexed by j[N]d𝑗superscriptdelimited-[]𝑁𝑑j\in[N]^{d}italic_j ∈ [ italic_N ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, where [N]d={1,,N}dsuperscriptdelimited-[]𝑁𝑑superscript1𝑁𝑑[N]^{d}=\{1,\dots,N\}^{d}[ italic_N ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT = { 1 , … , italic_N } start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. For any multi-index j[N]d𝑗superscriptdelimited-[]𝑁𝑑j\in[N]^{d}italic_j ∈ [ italic_N ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we define ϕλ,j(x)subscriptitalic-ϕ𝜆𝑗𝑥\phi_{\lambda,j}(x)italic_ϕ start_POSTSUBSCRIPT italic_λ , italic_j end_POSTSUBSCRIPT ( italic_x ) as a rescaled and translated copy of ϕλsubscriptitalic-ϕ𝜆\phi_{\lambda}italic_ϕ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT, such that the support of ϕλ,jsubscriptitalic-ϕ𝜆𝑗\phi_{\lambda,j}italic_ϕ start_POSTSUBSCRIPT italic_λ , italic_j end_POSTSUBSCRIPT coincides with the j𝑗jitalic_j-th cube. In particular, by construction of ϕλsubscriptitalic-ϕ𝜆\phi_{\lambda}italic_ϕ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT, this implies that

(3.8) ϕλ,jL1([0,1]d)2superscriptsubscriptnormsubscriptitalic-ϕ𝜆𝑗superscript𝐿1superscript01𝑑2\displaystyle\|\phi_{\lambda,j}\|_{L^{1}([0,1]^{d})}^{2}∥ italic_ϕ start_POSTSUBSCRIPT italic_λ , italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (1λ)dNd,absentsuperscript1𝜆𝑑superscript𝑁𝑑\displaystyle\geq(1-\lambda)^{d}N^{-d},≥ ( 1 - italic_λ ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_N start_POSTSUPERSCRIPT - italic_d end_POSTSUPERSCRIPT ,
(3.9) Lip(ϕλ,j)Lipsubscriptitalic-ϕ𝜆𝑗\displaystyle\mathrm{Lip}(\phi_{\lambda,j})roman_Lip ( italic_ϕ start_POSTSUBSCRIPT italic_λ , italic_j end_POSTSUBSCRIPT ) 2Nλ1.absent2𝑁superscript𝜆1\displaystyle\leq 2N\lambda^{-1}.≤ 2 italic_N italic_λ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT .

We also note that the ϕλ,jsubscriptitalic-ϕ𝜆𝑗\phi_{\lambda,j}italic_ϕ start_POSTSUBSCRIPT italic_λ , italic_j end_POSTSUBSCRIPT have essentially disjoint supports. For σ{1,1}[N]d𝜎superscript11superscriptdelimited-[]𝑁𝑑\sigma\in\{-1,1\}^{[N]^{d}}italic_σ ∈ { - 1 , 1 } start_POSTSUPERSCRIPT [ italic_N ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, we now define

fσ(x)=λ2Nj[N]dσjϕλ,j(x).subscript𝑓𝜎𝑥𝜆2𝑁subscript𝑗superscriptdelimited-[]𝑁𝑑subscript𝜎𝑗subscriptitalic-ϕ𝜆𝑗𝑥f_{\sigma}(x)=\frac{\lambda}{2N}\sum_{j\in[N]^{d}}\sigma_{j}\phi_{\lambda,j}(x).italic_f start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG italic_λ end_ARG start_ARG 2 italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ [ italic_N ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_λ , italic_j end_POSTSUBSCRIPT ( italic_x ) .

The factor in front of the sum ensures that Lip(fσ)1Lipsubscript𝑓𝜎1\mathrm{Lip}(f_{\sigma})\leq 1roman_Lip ( italic_f start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) ≤ 1. Furthermore, we also note that fσC([0,1]d)λ/2N1subscriptnormsubscript𝑓𝜎𝐶superscript01𝑑𝜆2𝑁1\|f_{\sigma}\|_{C([0,1]^{d})}\leq\lambda/2N\leq 1∥ italic_f start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_C ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ≤ italic_λ / 2 italic_N ≤ 1 for any choice of λ(0,1)𝜆01\lambda\in(0,1)italic_λ ∈ ( 0 , 1 ) and N𝑁N\in\mathbb{N}italic_N ∈ blackboard_N. In particular, we have fσLip1([0,1]d)subscript𝑓𝜎subscriptLip1superscript01𝑑f_{\sigma}\in\mathrm{Lip}_{1}([0,1]^{d})italic_f start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∈ roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ), for any choice of σ𝜎\sigmaitalic_σ. We finally observe that, due to the disjoint supports of the ϕλ,jsubscriptitalic-ϕ𝜆𝑗\phi_{\lambda,j}italic_ϕ start_POSTSUBSCRIPT italic_λ , italic_j end_POSTSUBSCRIPT, we have, for any σ,σ{1,1}[N]d𝜎superscript𝜎superscript11superscriptdelimited-[]𝑁𝑑\sigma,\sigma^{\prime}\in\{-1,1\}^{[N]^{d}}italic_σ , italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ { - 1 , 1 } start_POSTSUPERSCRIPT [ italic_N ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT,

fσfσL1([0,1]d)subscriptnormsubscript𝑓𝜎subscript𝑓superscript𝜎superscript𝐿1superscript01𝑑\displaystyle\|f_{\sigma}-f_{\sigma^{\prime}}\|_{L^{1}([0,1]^{d})}∥ italic_f start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT =λ2Nj[N]d|σjσj|ϕλ,jL1([0,1]d)absent𝜆2𝑁subscript𝑗superscriptdelimited-[]𝑁𝑑subscript𝜎𝑗subscriptsuperscript𝜎𝑗subscriptnormsubscriptitalic-ϕ𝜆𝑗superscript𝐿1superscript01𝑑\displaystyle=\frac{\lambda}{2N}\sum_{j\in[N]^{d}}|\sigma_{j}-\sigma^{\prime}_% {j}|\|\phi_{\lambda,j}\|_{L^{1}([0,1]^{d})}= divide start_ARG italic_λ end_ARG start_ARG 2 italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ [ italic_N ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ∥ italic_ϕ start_POSTSUBSCRIPT italic_λ , italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT
λ(1λ)dN1#{σjσj}Nd.absent𝜆superscript1𝜆𝑑superscript𝑁1#subscript𝜎𝑗superscriptsubscript𝜎𝑗superscript𝑁𝑑\displaystyle\geq\lambda(1-\lambda)^{d}N^{-1}\frac{\#\{\sigma_{j}\neq\sigma_{j% }^{\prime}\}}{N^{d}}.≥ italic_λ ( 1 - italic_λ ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_N start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT divide start_ARG # { italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≠ italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } end_ARG start_ARG italic_N start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG .

The last quotient is the fraction of entries in which σ𝜎\sigmaitalic_σ and σsuperscript𝜎\sigma^{\prime}italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT differ. It turns out that there exists a subset Ξ{1,1}[N]dΞsuperscript11superscriptdelimited-[]𝑁𝑑\Xi\subset\{-1,1\}^{[N]^{d}}roman_Ξ ⊂ { - 1 , 1 } start_POSTSUPERSCRIPT [ italic_N ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, such that any σσ𝜎superscript𝜎\sigma\neq\sigma^{\prime}italic_σ ≠ italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT belonging to ΞΞ\Xiroman_Ξ differ on a substantial fraction of their components; more precisely, as noted in [1] as a result of the Gilbert-Varshamov bound, there exists a subset Ξ{1,1}[N]dΞsuperscript11superscriptdelimited-[]𝑁𝑑\Xi\subset\{-1,1\}^{[N]^{d}}roman_Ξ ⊂ { - 1 , 1 } start_POSTSUPERSCRIPT [ italic_N ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT satisfying that any two distinct elements σ,σΞ𝜎superscript𝜎Ξ\sigma,\sigma^{\prime}\in\Xiitalic_σ , italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ξ, differ on at least a fourth of their coordinates,

(3.10) #{σjσj}Nd14,σ,σΞ,σσ,formulae-sequence#subscript𝜎𝑗superscriptsubscript𝜎𝑗superscript𝑁𝑑14for-all𝜎formulae-sequencesuperscript𝜎Ξ𝜎superscript𝜎\displaystyle\frac{\#\{\sigma_{j}\neq\sigma_{j}^{\prime}\}}{N^{d}}\geq\frac{1}% {4},\qquad\forall\,\sigma,\sigma^{\prime}\in\Xi,\;\sigma\neq\sigma^{\prime},divide start_ARG # { italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≠ italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } end_ARG start_ARG italic_N start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG ≥ divide start_ARG 1 end_ARG start_ARG 4 end_ARG , ∀ italic_σ , italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ξ , italic_σ ≠ italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ,

and the cardinality of ΞΞ\Xiroman_Ξ is lower bounded by,

(3.11) #Ξexp(Nd/8)2Nd/8.#Ξsuperscript𝑁𝑑8superscript2superscript𝑁𝑑8\displaystyle\#\Xi\geq\exp(N^{d}/8)\geq 2^{N^{d}/8}.# roman_Ξ ≥ roman_exp ( italic_N start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT / 8 ) ≥ 2 start_POSTSUPERSCRIPT italic_N start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT / 8 end_POSTSUPERSCRIPT .

This implies that for any two σσ𝜎superscript𝜎\sigma\neq\sigma^{\prime}italic_σ ≠ italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in ΞΞ\Xiroman_Ξ, we have

fσfσL1([0,1]d)14Nλ(1λ)d.subscriptnormsubscript𝑓𝜎subscript𝑓superscript𝜎superscript𝐿1superscript01𝑑14𝑁𝜆superscript1𝜆𝑑\|f_{\sigma}-f_{\sigma^{\prime}}\|_{L^{1}([0,1]^{d})}\geq\frac{1}{4N}\lambda(1% -\lambda)^{d}.∥ italic_f start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG 4 italic_N end_ARG italic_λ ( 1 - italic_λ ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .

Optimizing the right-hand side over λ(0,1)𝜆01\lambda\in(0,1)italic_λ ∈ ( 0 , 1 ), we set λ=1/(1+d)𝜆11𝑑\lambda=1/(1+d)italic_λ = 1 / ( 1 + italic_d ) to obtain,

fσfσL1([0,1]d)14(d+1)N1(1+1/d)d14e(d+1)N18edN,subscriptnormsubscript𝑓𝜎subscript𝑓superscript𝜎superscript𝐿1superscript01𝑑14𝑑1𝑁1superscript11𝑑𝑑14𝑒𝑑1𝑁18𝑒𝑑𝑁\|f_{\sigma}-f_{\sigma^{\prime}}\|_{L^{1}([0,1]^{d})}\geq\frac{1}{4(d+1)N}% \frac{1}{(1+1/d)^{d}}\geq\frac{1}{4e(d+1)N}\geq\frac{1}{8edN},∥ italic_f start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG 4 ( italic_d + 1 ) italic_N end_ARG divide start_ARG 1 end_ARG start_ARG ( 1 + 1 / italic_d ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG ≥ divide start_ARG 1 end_ARG start_ARG 4 italic_e ( italic_d + 1 ) italic_N end_ARG ≥ divide start_ARG 1 end_ARG start_ARG 8 italic_e italic_d italic_N end_ARG ,

where we used that the Euler constant e(1+1/d)d𝑒superscript11𝑑𝑑e\geq(1+1/d)^{d}italic_e ≥ ( 1 + 1 / italic_d ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and the fact that d1𝑑1d\geq 1italic_d ≥ 1 implies d+12d𝑑12𝑑d+1\leq 2ditalic_d + 1 ≤ 2 italic_d in the last bound.

Taking into account the bound (3.11), it follows that the packing number (Lip1([0,1]d);ϵ)subscriptLip1superscript01𝑑italic-ϵ\mathcal{M}(\mathrm{Lip}_{1}([0,1]^{d});\epsilon)caligraphic_M ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ; italic_ϵ ), satisfies the lower bound,

log2(Lip1([0,1]d);(βdN)1)Nd/8,N,formulae-sequencesubscript2subscriptLip1superscript01𝑑superscriptsubscript𝛽𝑑𝑁1superscript𝑁𝑑8for-all𝑁\log_{2}\mathcal{M}(\mathrm{Lip}_{1}([0,1]^{d});(\beta_{d}N)^{-1})\geq N^{d}/8% ,\quad\forall\,N\in\mathbb{N},roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_M ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ; ( italic_β start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_N ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ≥ italic_N start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT / 8 , ∀ italic_N ∈ blackboard_N ,

where we have defined βd=8edsubscript𝛽𝑑8𝑒𝑑\beta_{d}=8editalic_β start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = 8 italic_e italic_d. Given ϵ(0,βd1]italic-ϵ0superscriptsubscript𝛽𝑑1\epsilon\in(0,\beta_{d}^{-1}]italic_ϵ ∈ ( 0 , italic_β start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ], we can find N𝑁N\in\mathbb{N}italic_N ∈ blackboard_N, such that

(βdN)1ϵ(2βdN)1.superscriptsubscript𝛽𝑑𝑁1italic-ϵsuperscript2subscript𝛽𝑑𝑁1(\beta_{d}N)^{-1}\geq\epsilon\geq(2\beta_{d}N)^{-1}.( italic_β start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_N ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≥ italic_ϵ ≥ ( 2 italic_β start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_N ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT .

It follows that

log2(Lip1([0,1]d);ϵ)subscript2subscriptLip1superscript01𝑑italic-ϵ\displaystyle\log_{2}\mathcal{M}(\mathrm{Lip}_{1}([0,1]^{d});\epsilon)roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_M ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ; italic_ϵ ) log2(Lip1([0,1]d);(2βdN)1)absentsubscript2subscriptLip1superscript01𝑑superscript2subscript𝛽𝑑𝑁1\displaystyle\geq\log_{2}\mathcal{M}(\mathrm{Lip}_{1}([0,1]^{d});(2\beta_{d}N)% ^{-1})≥ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_M ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ; ( 2 italic_β start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_N ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT )
(2N)d818(βdϵ2)d.absentsuperscript2𝑁𝑑818superscriptsubscript𝛽𝑑italic-ϵ2𝑑\displaystyle\geq\frac{(2N)^{d}}{8}\geq\frac{1}{8}\left(\frac{\beta_{d}% \epsilon}{2}\right)^{-d}.≥ divide start_ARG ( 2 italic_N ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG start_ARG 8 end_ARG ≥ divide start_ARG 1 end_ARG start_ARG 8 end_ARG ( divide start_ARG italic_β start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_ϵ end_ARG start_ARG 2 end_ARG ) start_POSTSUPERSCRIPT - italic_d end_POSTSUPERSCRIPT .

We conclude that

log2(Lip1([0,1]d);ϵ)18(βd2ϵ)d,ϵ(0,βd1].formulae-sequencesubscript2subscriptLip1superscript01𝑑italic-ϵ18superscriptsubscript𝛽𝑑2italic-ϵ𝑑for-allitalic-ϵ0superscriptsubscript𝛽𝑑1\log_{2}\mathcal{M}(\mathrm{Lip}_{1}([0,1]^{d});\epsilon)\geq\frac{1}{8}\left(% \frac{\beta_{d}}{2}\epsilon\right)^{-d},\quad\forall\,\epsilon\in(0,\beta_{d}^% {-1}].roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_M ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ; italic_ϵ ) ≥ divide start_ARG 1 end_ARG start_ARG 8 end_ARG ( divide start_ARG italic_β start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG italic_ϵ ) start_POSTSUPERSCRIPT - italic_d end_POSTSUPERSCRIPT , ∀ italic_ϵ ∈ ( 0 , italic_β start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ] .

This lower bound on the packing number holds for any dimension d𝑑d\in\mathbb{N}italic_d ∈ blackboard_N. We can now use the general relation 𝒩(A;ϵ)(A;2ϵ)𝒩𝐴italic-ϵ𝐴2italic-ϵ\mathcal{N}(A;\epsilon)\geq\mathcal{M}(A;2\epsilon)caligraphic_N ( italic_A ; italic_ϵ ) ≥ caligraphic_M ( italic_A ; 2 italic_ϵ ) between the covering- and packing-numbers (3.1), to conclude that,

(Lip1([0,1]d);ϵ)=log2𝒩(Lip1([0,1]d);ϵ)18(βdϵ)d,ϵ(0,βd],formulae-sequencesubscriptLip1superscript01𝑑italic-ϵsubscript2𝒩subscriptLip1superscript01𝑑italic-ϵ18superscriptsubscript𝛽𝑑italic-ϵ𝑑for-allitalic-ϵ0subscript𝛽𝑑\mathcal{H}(\mathrm{Lip}_{1}([0,1]^{d});\epsilon)=\log_{2}\mathcal{N}(\mathrm{% Lip}_{1}([0,1]^{d});\epsilon)\geq\frac{1}{8}\left(\beta_{d}\epsilon\right)^{-d% },\quad\forall\,\epsilon\in(0,\beta_{d}],caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ; italic_ϵ ) = roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_N ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ; italic_ϵ ) ≥ divide start_ARG 1 end_ARG start_ARG 8 end_ARG ( italic_β start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_ϵ ) start_POSTSUPERSCRIPT - italic_d end_POSTSUPERSCRIPT , ∀ italic_ϵ ∈ ( 0 , italic_β start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ] ,

where βd=8edsubscript𝛽𝑑8𝑒𝑑\beta_{d}=8editalic_β start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = 8 italic_e italic_d. This proves the claim with c=1/(8e)𝑐18𝑒c=1/(8e)italic_c = 1 / ( 8 italic_e ), i.e.

log2𝒩(Lip1([0,1]d);ϵ)18(cdϵ)d,ϵ(0,cd],formulae-sequencesubscript2𝒩subscriptLip1superscript01𝑑italic-ϵ18superscript𝑐𝑑italic-ϵ𝑑for-allitalic-ϵ0𝑐𝑑\log_{2}\mathcal{N}(\mathrm{Lip}_{1}([0,1]^{d});\epsilon)\geq\frac{1}{8}\left(% \frac{c}{d\epsilon}\right)^{d},\quad\forall\,\epsilon\in\left(0,\frac{c}{d}% \right],roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_N ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( [ 0 , 1 ] start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) ; italic_ϵ ) ≥ divide start_ARG 1 end_ARG start_ARG 8 end_ARG ( divide start_ARG italic_c end_ARG start_ARG italic_d italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∀ italic_ϵ ∈ ( 0 , divide start_ARG italic_c end_ARG start_ARG italic_d end_ARG ] ,

Assuming the results of Corollary 3.8 and Lemma 3.9, we can now prove Proposition 3.6.

Proof of Proposition 3.6.

Combining the lower bound (3.6) and (3.7), we obtain that for any d𝑑d\in\mathbb{N}italic_d ∈ blackboard_N,

log2𝒩(Lip1(𝒳);ϵ)Lp(μ)18(cλdLdϵ)d(cλd8Ldϵ)d,subscript2𝒩subscriptsubscriptLip1𝒳italic-ϵsuperscript𝐿𝑝𝜇18superscript𝑐subscript𝜆𝑑𝐿𝑑italic-ϵ𝑑superscript𝑐subscript𝜆𝑑8𝐿𝑑italic-ϵ𝑑\displaystyle\log_{2}\mathcal{N}(\mathrm{Lip}_{1}(\mathcal{X});\epsilon)_{L^{p% }(\mu)}\geq\frac{1}{8}\left(\frac{c\sqrt{\lambda_{d}}}{Ld\epsilon}\right)^{d}% \geq\left(\frac{c\sqrt{\lambda_{d}}}{8Ld\epsilon}\right)^{d},roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_N ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) ; italic_ϵ ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG 8 end_ARG ( divide start_ARG italic_c square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_L italic_d italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ≥ ( divide start_ARG italic_c square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 8 italic_L italic_d italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ,

provided that ϵcλdLditalic-ϵ𝑐subscript𝜆𝑑𝐿𝑑\epsilon\leq\frac{c\sqrt{\lambda_{d}}}{Ld}italic_ϵ ≤ divide start_ARG italic_c square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_L italic_d end_ARG. Since λdd2αgreater-than-or-equivalent-tosubscript𝜆𝑑superscript𝑑2𝛼\lambda_{d}\gtrsim d^{-2\alpha}italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ≳ italic_d start_POSTSUPERSCRIPT - 2 italic_α end_POSTSUPERSCRIPT by assumption, and since C𝐶Citalic_C and L𝐿Litalic_L are constants independent of d𝑑ditalic_d, it thus follows that there exist c1,c2>0subscript𝑐1subscript𝑐20c_{1},c_{2}>0italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0, independent of d𝑑ditalic_d, such that

(3.12) log2𝒩(Lip1(𝒳);ϵ)Lp(μ)(c1d1+αϵ)d,if ϵc2d(1+α).subscript2𝒩subscriptsubscriptLip1𝒳italic-ϵsuperscript𝐿𝑝𝜇superscriptsubscript𝑐1superscript𝑑1𝛼italic-ϵ𝑑if ϵc2d(1+α).\displaystyle\log_{2}\mathcal{N}(\mathrm{Lip}_{1}(\mathcal{X});\epsilon)_{L^{p% }(\mu)}\geq\left(\frac{c_{1}}{d^{1+\alpha}\epsilon}\right)^{d},\quad\text{if $% \epsilon\leq c_{2}d^{-(1+\alpha)}$.}roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_N ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) ; italic_ϵ ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT ≥ ( divide start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_d start_POSTSUPERSCRIPT 1 + italic_α end_POSTSUPERSCRIPT italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , if italic_ϵ ≤ italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT - ( 1 + italic_α ) end_POSTSUPERSCRIPT .

The idea is now to choose d=d(ϵ)ϵ1/(α+1)𝑑𝑑italic-ϵsimilar-tosuperscriptitalic-ϵ1𝛼1d=d(\epsilon)\sim\epsilon^{-1/(\alpha+1)}italic_d = italic_d ( italic_ϵ ) ∼ italic_ϵ start_POSTSUPERSCRIPT - 1 / ( italic_α + 1 ) end_POSTSUPERSCRIPT, such that the term inside the parentheses is lower bounded by eβsuperscript𝑒𝛽e^{\beta}italic_e start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT for some fixed β>0𝛽0\beta>0italic_β > 0, implying that the right hand side is (eβ)d=exp(βd)exp(cϵ1/(α+1))greater-than-or-equivalent-toabsentsuperscriptsuperscript𝑒𝛽𝑑𝛽𝑑greater-than-or-equivalent-to𝑐superscriptitalic-ϵ1𝛼1\gtrsim(e^{\beta})^{d}=\exp(\beta d)\gtrsim\exp(c\epsilon^{-1/(\alpha+1)})≳ ( italic_e start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT = roman_exp ( italic_β italic_d ) ≳ roman_exp ( italic_c italic_ϵ start_POSTSUPERSCRIPT - 1 / ( italic_α + 1 ) end_POSTSUPERSCRIPT ) for some constant c>0𝑐0c>0italic_c > 0. This then leads to the claimed lower bound. We now proceed to provide the details of the required argument.

We first fix β=log(c2/c1)𝛽subscript𝑐2subscript𝑐1\beta=-\log(c_{2}/c_{1})italic_β = - roman_log ( italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), such that

(3.13) eβ=c2/c1.superscript𝑒𝛽subscript𝑐2subscript𝑐1\displaystyle e^{-\beta}=c_{2}/c_{1}.italic_e start_POSTSUPERSCRIPT - italic_β end_POSTSUPERSCRIPT = italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT .

We next define

(3.14) ϵ0=c1eβ=c2.subscriptitalic-ϵ0subscript𝑐1superscript𝑒𝛽subscript𝑐2\displaystyle\epsilon_{0}=c_{1}e^{-\beta}=c_{2}.italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β end_POSTSUPERSCRIPT = italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT .

Since c1,c2subscript𝑐1subscript𝑐2c_{1},c_{2}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are independent of d𝑑ditalic_d, it follows that also β𝛽\betaitalic_β and ϵ0subscriptitalic-ϵ0\epsilon_{0}italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are independent of d𝑑ditalic_d.

For any ϵ(0,ϵ0]italic-ϵ0subscriptitalic-ϵ0\epsilon\in(0,\epsilon_{0}]italic_ϵ ∈ ( 0 , italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ], the above choice ensures that

ϵϵ0c1eβ,italic-ϵsubscriptitalic-ϵ0subscript𝑐1superscript𝑒𝛽\epsilon\leq\epsilon_{0}\leq c_{1}e^{-\beta},italic_ϵ ≤ italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β end_POSTSUPERSCRIPT ,

and hence there exists a unique d=d(ϵ)𝑑𝑑italic-ϵd=d(\epsilon)\in\mathbb{N}italic_d = italic_d ( italic_ϵ ) ∈ blackboard_N, such that

ϵd(1+α)c1eβ<ϵ(2d)(1+α).italic-ϵsuperscript𝑑1𝛼subscript𝑐1superscript𝑒𝛽italic-ϵsuperscript2𝑑1𝛼\epsilon d^{(1+\alpha)}\leq c_{1}e^{-\beta}<\epsilon(2d)^{(1+\alpha)}.italic_ϵ italic_d start_POSTSUPERSCRIPT ( 1 + italic_α ) end_POSTSUPERSCRIPT ≤ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β end_POSTSUPERSCRIPT < italic_ϵ ( 2 italic_d ) start_POSTSUPERSCRIPT ( 1 + italic_α ) end_POSTSUPERSCRIPT .

In particular, upon rearranging the first inequality in the last display, we obtain the two equivalent formulations,

(3.15) ϵc1eβd(1+α)=c2d(1+α),italic-ϵsubscript𝑐1superscript𝑒𝛽superscript𝑑1𝛼subscript𝑐2superscript𝑑1𝛼\displaystyle\epsilon\leq c_{1}e^{-\beta}d^{-(1+\alpha)}=c_{2}d^{-(1+\alpha)},italic_ϵ ≤ italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT - ( 1 + italic_α ) end_POSTSUPERSCRIPT = italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT - ( 1 + italic_α ) end_POSTSUPERSCRIPT ,
(3.16) c1d(1+α)ϵeβ.subscript𝑐1superscript𝑑1𝛼italic-ϵsuperscript𝑒𝛽\displaystyle\frac{c_{1}}{d^{(1+\alpha)}\epsilon}\geq e^{\beta}.divide start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_d start_POSTSUPERSCRIPT ( 1 + italic_α ) end_POSTSUPERSCRIPT italic_ϵ end_ARG ≥ italic_e start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT .

while the second bound c1eβ<ϵ(2d)(1+α)subscript𝑐1superscript𝑒𝛽italic-ϵsuperscript2𝑑1𝛼c_{1}e^{-\beta}<\epsilon(2d)^{(1+\alpha)}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β end_POSTSUPERSCRIPT < italic_ϵ ( 2 italic_d ) start_POSTSUPERSCRIPT ( 1 + italic_α ) end_POSTSUPERSCRIPT implies,

(3.17) βdcϵ1/(α+1),where c:=β[c12eβ]1/(α+1).formulae-sequence𝛽𝑑𝑐superscriptitalic-ϵ1𝛼1assignwhere 𝑐𝛽superscriptdelimited-[]subscript𝑐12superscript𝑒𝛽1𝛼1\displaystyle\beta d\geq c\epsilon^{-1/(\alpha+1)},\quad\text{where }c:=\beta% \left[\frac{c_{1}}{2e^{\beta}}\right]^{1/(\alpha+1)}.italic_β italic_d ≥ italic_c italic_ϵ start_POSTSUPERSCRIPT - 1 / ( italic_α + 1 ) end_POSTSUPERSCRIPT , where italic_c := italic_β [ divide start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_e start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT end_ARG ] start_POSTSUPERSCRIPT 1 / ( italic_α + 1 ) end_POSTSUPERSCRIPT .

With this choice of d=d(ϵ)𝑑𝑑italic-ϵd=d(\epsilon)italic_d = italic_d ( italic_ϵ ), equation (3.15) guarantees that the estimate in (3.12) applies to all ϵ(0,ϵ0]italic-ϵ0subscriptitalic-ϵ0\epsilon\in(0,\epsilon_{0}]italic_ϵ ∈ ( 0 , italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ]. This in turn implies that

(Lip1(𝒳);ϵ)Lp(μ)subscriptsubscriptLip1𝒳italic-ϵsuperscript𝐿𝑝𝜇\displaystyle\mathcal{H}(\mathrm{Lip}_{1}(\mathcal{X});\epsilon)_{L^{p}(\mu)}caligraphic_H ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) ; italic_ϵ ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT =log2𝒩(Lip1(𝒳);ϵ)Lp(μ)absentsubscript2𝒩subscriptsubscriptLip1𝒳italic-ϵsuperscript𝐿𝑝𝜇\displaystyle=\log_{2}\mathcal{N}(\mathrm{Lip}_{1}(\mathcal{X});\epsilon)_{L^{% p}(\mu)}= roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_N ( roman_Lip start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( caligraphic_X ) ; italic_ϵ ) start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ ) end_POSTSUBSCRIPT
(3.12)(c1d(1+α)ϵ)ditalic-(3.12italic-)superscriptsubscript𝑐1superscript𝑑1𝛼italic-ϵ𝑑\displaystyle\overset{\mathclap{\underset{\downarrow}{\eqref{eq:logN-d}}}}{% \geq}\left(\frac{c_{1}}{d^{(1+\alpha)}\epsilon}\right)^{d}start_OVERACCENT under↓ start_ARG italic_( italic_) end_ARG end_OVERACCENT start_ARG ≥ end_ARG ( divide start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_d start_POSTSUPERSCRIPT ( 1 + italic_α ) end_POSTSUPERSCRIPT italic_ϵ end_ARG ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT
(3.16)eβditalic-(3.16italic-)superscript𝑒𝛽𝑑\displaystyle\overset{\mathclap{\underset{\downarrow}{\eqref{eq:dd2}}}}{\geq}e% ^{\beta d}start_OVERACCENT under↓ start_ARG italic_( italic_) end_ARG end_OVERACCENT start_ARG ≥ end_ARG italic_e start_POSTSUPERSCRIPT italic_β italic_d end_POSTSUPERSCRIPT
(3.17)exp(cϵ1/(α+1)),italic-(3.17italic-)𝑐superscriptitalic-ϵ1𝛼1\displaystyle\overset{\mathclap{\underset{\downarrow}{\eqref{eq:dd3}}}}{\geq}% \exp\left(c\epsilon^{-1/(\alpha+1)}\right),start_OVERACCENT under↓ start_ARG italic_( italic_) end_ARG end_OVERACCENT start_ARG ≥ end_ARG roman_exp ( italic_c italic_ϵ start_POSTSUPERSCRIPT - 1 / ( italic_α + 1 ) end_POSTSUPERSCRIPT ) ,

for all ϵ(0,ϵ0]italic-ϵ0subscriptitalic-ϵ0\epsilon\in(0,\epsilon_{0}]italic_ϵ ∈ ( 0 , italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ]. This is the claimed lower bound on the metric entropy. ∎

4. Generic approximation results

We first discuss an abstract formulation of a general “approximation task”. Let 𝗩𝗩\mathsf{\bm{V}}bold_sansserif_V be a Banach space (e.g. a space of operators). In a general non-linear approximation task, we are given for any n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N a set Σn𝗩subscriptΣ𝑛𝗩\Sigma_{n}\subset\mathsf{\bm{V}}roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊂ bold_sansserif_V over which we aim to approximate an element f𝗩𝑓𝗩f\in\mathsf{\bm{V}}italic_f ∈ bold_sansserif_V, where we will assume that f𝑓fitalic_f belongs to a general class 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V of interest. Considering these subsets Σn𝗩subscriptΣ𝑛𝗩\Sigma_{n}\subset\mathsf{\bm{V}}roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊂ bold_sansserif_V fixed, and given a sequence ϵn0subscriptitalic-ϵ𝑛0\epsilon_{n}\to 0italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → 0, we will say that f𝗔𝑓𝗔f\in\mathsf{\bm{A}}italic_f ∈ bold_sansserif_A can be approximated with convergence rate ϵnsubscriptitalic-ϵ𝑛\epsilon_{n}italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, if there exists a constant Mf>0subscript𝑀𝑓0M_{f}>0italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT > 0, such that

(4.1) infψnΣnfψ𝒳Mfϵn,n.formulae-sequencesubscriptinfimumsubscript𝜓𝑛subscriptΣ𝑛subscriptnorm𝑓𝜓𝒳subscript𝑀𝑓subscriptitalic-ϵ𝑛for-all𝑛\displaystyle\inf_{\psi_{n}\in\Sigma_{n}}\|f-\psi\|_{\mathcal{X}}\leq M_{f}% \epsilon_{n},\quad\forall\,n\in\mathbb{N}.roman_inf start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_f - italic_ψ ∥ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ≤ italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , ∀ italic_n ∈ blackboard_N .

Specifically, we will be most interested in the logarithmic case ϵn=log(n)γ\epsilon_{n}=\log(n)^{-\gamma}italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_log ( italic_n ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT, in the following, with ΣnsubscriptΣ𝑛\Sigma_{n}roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT corresponding to all possible realizations of a fixed bit-encoded neural operator architecture (cp. the proofs of Propositions 2.13 and 2.14, respectively).

Coming back to the general abstract setting above, and given M>0𝑀0M>0italic_M > 0, we introduce a set of “efficiently approximated” elements 𝗘M𝗔subscript𝗘𝑀𝗔\mathsf{\bm{E}}_{M}\subset\mathsf{\bm{A}}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ⊂ bold_sansserif_A with bound M𝑀Mitalic_M, i.e.

(4.2) 𝗘M:={f𝗔|inequality (4.1) holds with constant Mf=M}.assignsubscript𝗘𝑀conditional-set𝑓𝗔inequality (4.1) holds with constant Mf=M\displaystyle\mathsf{\bm{E}}_{M}:={\left\{f\in\mathsf{\bm{A}}\,\middle|\,\text% {inequality \eqref{eq:eff} holds with constant $M_{f}=M$}\right\}}.bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT := { italic_f ∈ bold_sansserif_A | inequality ( ) holds with constant italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = italic_M } .

And we denote the set of all f𝗔𝑓𝗔f\in\mathsf{\bm{A}}italic_f ∈ bold_sansserif_A which can be approximated at convergence rate ϵnsubscriptitalic-ϵ𝑛\epsilon_{n}italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, by elements in ΣnsubscriptΣ𝑛\Sigma_{n}roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, by

(4.3) 𝗘=M>0𝗘M={f𝗔|there exists Mf such that (4.1) holds}.𝗘subscript𝑀0subscript𝗘𝑀conditional-set𝑓𝗔there exists Mf such that (4.1) holds\displaystyle\mathsf{\bm{E}}=\bigcup_{M>0}\mathsf{\bm{E}}_{M}={\left\{f\in% \mathsf{\bm{A}}\,\middle|\,\text{there exists $M_{f}$ such that \eqref{eq:eff}% holds}\right\}}.bold_sansserif_E = ⋃ start_POSTSUBSCRIPT italic_M > 0 end_POSTSUBSCRIPT bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT = { italic_f ∈ bold_sansserif_A | there exists italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT such that ( ) holds } .

Our goal is to study generically achievable approximation rates ϵnsubscriptitalic-ϵ𝑛\epsilon_{n}italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, in terms of the complexity of 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A, as measured by its metric entropy.

The following lemma will be fundamental to our analysis:

Lemma 4.1.

Let 𝗩𝗩\mathsf{\bm{V}}bold_sansserif_V be a Banach space. Let 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V be a compact, convex subset. Let {Σn}nsubscriptsubscriptΣ𝑛𝑛\{\Sigma_{n}\}_{n\in\mathbb{N}}{ roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be a family of subsets Σn𝗩subscriptΣ𝑛𝗩\Sigma_{n}\subset\mathsf{\bm{V}}roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊂ bold_sansserif_V, with |Σn|2nsubscriptΣ𝑛superscript2𝑛|\Sigma_{n}|\leq 2^{n}| roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | ≤ 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT elements. Fix M>0𝑀0M>0italic_M > 0. If 𝗘M𝗔subscript𝗘𝑀𝗔\mathsf{\bm{E}}_{M}\subset\mathsf{\bm{A}}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ⊂ bold_sansserif_A given by (4.2) has non-empty interior in the subspace topology on 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A, then there exists a constant λ>0𝜆0\lambda>0italic_λ > 0, independent of n𝑛nitalic_n, such that the metric entropy satisfies the bound,

(𝗔;λϵn)𝗩n,n.formulae-sequencesubscript𝗔𝜆subscriptitalic-ϵ𝑛𝗩𝑛for-all𝑛\mathcal{H}(\mathsf{\bm{A}};\lambda\epsilon_{n})_{\mathsf{\bm{V}}}\leq n,\quad% \forall\,n\in\mathbb{N}.caligraphic_H ( bold_sansserif_A ; italic_λ italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT ≤ italic_n , ∀ italic_n ∈ blackboard_N .
Proof.

At the outset we note that by compactness, we have a uniform upper bound,

supf𝗔fC𝗔<.subscriptsupremum𝑓𝗔norm𝑓subscript𝐶𝗔\displaystyle\sup_{f\in\mathsf{\bm{A}}}\|f\|\leq C_{\mathsf{\bm{A}}}<\infty.roman_sup start_POSTSUBSCRIPT italic_f ∈ bold_sansserif_A end_POSTSUBSCRIPT ∥ italic_f ∥ ≤ italic_C start_POSTSUBSCRIPT bold_sansserif_A end_POSTSUBSCRIPT < ∞ .

Upon a simple rescaling, we may wlog assume that C𝗔=1subscript𝐶𝗔1C_{\mathsf{\bm{A}}}=1italic_C start_POSTSUBSCRIPT bold_sansserif_A end_POSTSUBSCRIPT = 1, i.e. that f1norm𝑓1\|f\|\leq 1∥ italic_f ∥ ≤ 1 for all f𝗔𝑓𝗔f\in\mathsf{\bm{A}}italic_f ∈ bold_sansserif_A. This will be assumed in the following proof.

Our next goal is to show that, for any M>0𝑀0M>0italic_M > 0, the set 𝗘Msubscript𝗘𝑀\mathsf{\bm{E}}_{M}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT defined by (4.8) has empty interior. For the sake of contradiction, assume that 𝗘Msubscript𝗘𝑀\mathsf{\bm{E}}_{M}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT does not have empty interior. Then there exists f0𝗔subscript𝑓0𝗔f_{0}\in\mathsf{\bm{A}}italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ bold_sansserif_A and δ>0𝛿0\delta>0italic_δ > 0, such that

Bδ(f0)𝗘MψnΣnBMϵn¯(ψn),subscript𝐵𝛿subscript𝑓0subscript𝗘𝑀subscriptsubscript𝜓𝑛subscriptΣ𝑛¯subscript𝐵𝑀subscriptitalic-ϵ𝑛subscript𝜓𝑛B_{\delta}(f_{0})\subset\mathsf{\bm{E}}_{M}\subset\bigcup_{\psi_{n}\in\Sigma_{% n}}\overline{B_{M\epsilon_{n}}}(\psi_{n}),italic_B start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ⊂ bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ⊂ ⋃ start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT over¯ start_ARG italic_B start_POSTSUBSCRIPT italic_M italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ( italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ,

where Bδ(f0)={f𝗔|ff0<δ}𝗔subscript𝐵𝛿subscript𝑓0conditional-set𝑓𝗔norm𝑓subscript𝑓0𝛿𝗔B_{\delta}(f_{0})={\left\{f\in\mathsf{\bm{A}}\,\middle|\,\|f-f_{0}\|<\delta% \right\}}\subset\mathsf{\bm{A}}italic_B start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = { italic_f ∈ bold_sansserif_A | ∥ italic_f - italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ < italic_δ } ⊂ bold_sansserif_A is an open ball in the subspace topology on 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A. Thus, for any n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N, we obtain the following bound on the covering numbers,

(4.4) 𝒩(Bδ(f0);Mϵn)𝒩(𝗘M;Mϵn)|Σn|2n.𝒩subscript𝐵𝛿subscript𝑓0𝑀subscriptitalic-ϵ𝑛𝒩subscript𝗘𝑀𝑀subscriptitalic-ϵ𝑛subscriptΣ𝑛superscript2𝑛\displaystyle\mathcal{N}(B_{\delta}(f_{0});M\epsilon_{n})\leq\mathcal{N}(% \mathsf{\bm{E}}_{M};M\epsilon_{n})\leq|\Sigma_{n}|\leq 2^{n}.caligraphic_N ( italic_B start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ; italic_M italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≤ caligraphic_N ( bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ; italic_M italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≤ | roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | ≤ 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT .

We next recall that we have wlog assumed supf𝗔f1subscriptsupremum𝑓𝗔norm𝑓1\sup_{f\in\mathsf{\bm{A}}}\|f\|\leq 1roman_sup start_POSTSUBSCRIPT italic_f ∈ bold_sansserif_A end_POSTSUBSCRIPT ∥ italic_f ∥ ≤ 1, and we recall that 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A is convex by assumption. In particular, we next show that this implies that

(1δ3)f0+δ3𝗔Bδ(f0).1𝛿3subscript𝑓0𝛿3𝗔subscript𝐵𝛿subscript𝑓0\left(1-\frac{\delta}{3}\right)f_{0}+\frac{\delta}{3}\mathsf{\bm{A}}\subset B_% {\delta}(f_{0}).( 1 - divide start_ARG italic_δ end_ARG start_ARG 3 end_ARG ) italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG italic_δ end_ARG start_ARG 3 end_ARG bold_sansserif_A ⊂ italic_B start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) .

To see why, let δ=δ/3superscript𝛿𝛿3\delta^{\prime}=\delta/3italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_δ / 3 and fix f𝗔𝑓𝗔f\in\mathsf{\bm{A}}italic_f ∈ bold_sansserif_A arbitrary. We need to show that fδ:=(1δ)f0+δfBδ(f0)assignsubscript𝑓superscript𝛿1superscript𝛿subscript𝑓0superscript𝛿𝑓subscript𝐵𝛿subscript𝑓0f_{\delta^{\prime}}:=(1-\delta^{\prime})f_{0}+\delta^{\prime}f\in B_{\delta}(f% _{0})italic_f start_POSTSUBSCRIPT italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT := ( 1 - italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_f ∈ italic_B start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). Since 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A is convex, it is clear that fδ𝗔subscript𝑓superscript𝛿𝗔f_{\delta^{\prime}}\in\mathsf{\bm{A}}italic_f start_POSTSUBSCRIPT italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∈ bold_sansserif_A. In addition, we also have

fδf0=(1δ)f0+δff0=δff02δ=2δ3<δ.normsubscript𝑓superscript𝛿subscript𝑓0norm1superscript𝛿subscript𝑓0superscript𝛿𝑓subscript𝑓0superscript𝛿norm𝑓subscript𝑓02superscript𝛿2𝛿3𝛿\|f_{\delta^{\prime}}-f_{0}\|=\|(1-\delta^{\prime})f_{0}+\delta^{\prime}f-f_{0% }\|=\delta^{\prime}\|f-f_{0}\|\leq 2\delta^{\prime}=\frac{2\delta}{3}<\delta.∥ italic_f start_POSTSUBSCRIPT italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ = ∥ ( 1 - italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_f - italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ = italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ italic_f - italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ≤ 2 italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG 2 italic_δ end_ARG start_ARG 3 end_ARG < italic_δ .

Hence, fδBδ(f0)subscript𝑓superscript𝛿subscript𝐵𝛿subscript𝑓0f_{\delta^{\prime}}\in B_{\delta}(f_{0})italic_f start_POSTSUBSCRIPT italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) as claimed. The inclusion, (1δ/3)f0+(δ/3)𝗔Bδ(f0)1𝛿3subscript𝑓0𝛿3𝗔subscript𝐵𝛿subscript𝑓0(1-\delta/3)f_{0}+(\delta/3)\mathsf{\bm{A}}\subset B_{\delta}(f_{0})( 1 - italic_δ / 3 ) italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( italic_δ / 3 ) bold_sansserif_A ⊂ italic_B start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) now implies,

(4.5) 𝒩(Bδ(f0);Mϵn)𝒩((δ/3)𝗔;Mϵn)=𝒩(𝗔;3Mϵn/δ).𝒩subscript𝐵𝛿subscript𝑓0𝑀subscriptitalic-ϵ𝑛𝒩𝛿3𝗔𝑀subscriptitalic-ϵ𝑛𝒩𝗔3𝑀subscriptitalic-ϵ𝑛𝛿\displaystyle\mathcal{N}(B_{\delta}(f_{0});M\epsilon_{n})\geq\mathcal{N}((% \delta/3)\mathsf{\bm{A}};M\epsilon_{n})=\mathcal{N}(\mathsf{\bm{A}};3M\epsilon% _{n}/\delta).caligraphic_N ( italic_B start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ; italic_M italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≥ caligraphic_N ( ( italic_δ / 3 ) bold_sansserif_A ; italic_M italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = caligraphic_N ( bold_sansserif_A ; 3 italic_M italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_δ ) .

Combining (4.4) and (4.5), we conclude that

(𝗔;3Mϵn/δ)𝗩=log2𝒩(𝗔;3Mϵn/δ)𝗩n,n.formulae-sequencesubscript𝗔3𝑀subscriptitalic-ϵ𝑛𝛿𝗩subscript2𝒩subscript𝗔3𝑀subscriptitalic-ϵ𝑛𝛿𝗩𝑛for-all𝑛\mathcal{H}(\mathsf{\bm{A}};3M\epsilon_{n}/\delta)_{\mathsf{\bm{V}}}=\log_{2}% \mathcal{N}(\mathsf{\bm{A}};3M\epsilon_{n}/\delta)_{\mathsf{\bm{V}}}\leq n,% \quad\forall\,n\in\mathbb{N}.caligraphic_H ( bold_sansserif_A ; 3 italic_M italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_δ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT = roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT caligraphic_N ( bold_sansserif_A ; 3 italic_M italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / italic_δ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT ≤ italic_n , ∀ italic_n ∈ blackboard_N .

We emphasize that M,δ>0𝑀𝛿0M,\delta>0italic_M , italic_δ > 0 are independent of n𝑛nitalic_n in the above argument. In particular, the claim of the lemma holds with constant λ=3M/δ>0𝜆3𝑀𝛿0\lambda=3M/\delta>0italic_λ = 3 italic_M / italic_δ > 0. ∎

Proposition 4.2 (Exponential scaling).

Let 𝗩𝗩\mathsf{\bm{V}}bold_sansserif_V be a Banach space. Let 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V be a compact, convex subset. Assume that there exist constants C,c,γ>0𝐶𝑐𝛾0C,c,\gamma>0italic_C , italic_c , italic_γ > 0 such that,

(4.6) (𝗔;ϵ)𝗩Cexp(cϵ1/γ),ϵ>0.formulae-sequencesubscript𝗔italic-ϵ𝗩𝐶𝑐superscriptitalic-ϵ1𝛾for-allitalic-ϵ0\displaystyle\mathcal{H}(\mathsf{\bm{A}};\epsilon)_{\mathsf{\bm{V}}}\geq C\exp% \left(c\epsilon^{-1/\gamma}\right),\quad\forall\,\epsilon>0.caligraphic_H ( bold_sansserif_A ; italic_ϵ ) start_POSTSUBSCRIPT bold_sansserif_V end_POSTSUBSCRIPT ≥ italic_C roman_exp ( italic_c italic_ϵ start_POSTSUPERSCRIPT - 1 / italic_γ end_POSTSUPERSCRIPT ) , ∀ italic_ϵ > 0 .

Let {Σn}nsubscriptsubscriptΣ𝑛𝑛\{\Sigma_{n}\}_{n\in\mathbb{N}}{ roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be family of subsets Σn𝗩subscriptΣ𝑛𝗩\Sigma_{n}\subset\mathsf{\bm{V}}roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊂ bold_sansserif_V with |Σn|2nsubscriptΣ𝑛superscript2𝑛|\Sigma_{n}|\leq 2^{n}| roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | ≤ 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT elements. Then generic elements f𝗔𝑓𝗔f\in\mathsf{\bm{A}}italic_f ∈ bold_sansserif_A cannot be approximated by elements of ΣnsubscriptΣ𝑛\Sigma_{n}roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT at convergence rate better than log(n)γ\log(n)^{-\gamma}roman_log ( italic_n ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT; more precisely, for any sequence ϵn=o(log(n)γ)\epsilon_{n}=o(\log(n)^{-\gamma})italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_o ( roman_log ( italic_n ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ), the subset 𝗥𝗔𝗥𝗔\mathsf{\bm{R}}\subset\mathsf{\bm{A}}bold_sansserif_R ⊂ bold_sansserif_A, consisting of all f𝗔𝑓𝗔f\in\mathsf{\bm{A}}italic_f ∈ bold_sansserif_A, such that

(4.7) infψnΣnfψnO(ϵn),subscriptinfimumsubscript𝜓𝑛subscriptΣ𝑛norm𝑓subscript𝜓𝑛𝑂subscriptitalic-ϵ𝑛\displaystyle\inf_{\psi_{n}\in\Sigma_{n}}\|f-\psi_{n}\|\not=O(\epsilon_{n}),roman_inf start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_f - italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ ≠ italic_O ( italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ,

is residual.

Before coming to the proof of Proposition 4.2, we note that since 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V is compact, 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A is a complete metric space in the subspace topology. In particular, the following argument, which is based on the Baire category theorem, can be applied to 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A (cp. Appendix A for a summary).

Proof.

Let 𝗥:=𝗔𝗘assign𝗥𝗔𝗘\mathsf{\bm{R}}:=\mathsf{\bm{A}}\setminus\mathsf{\bm{E}}bold_sansserif_R := bold_sansserif_A ∖ bold_sansserif_E, where 𝗘𝗘\mathsf{\bm{E}}bold_sansserif_E is defined by (4.3). Recall that 𝗘𝗘\mathsf{\bm{E}}bold_sansserif_E is precisely the set of f𝗔𝑓𝗔f\in\mathsf{\bm{A}}italic_f ∈ bold_sansserif_A for which there exists Mf>0subscript𝑀𝑓0M_{f}>0italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT > 0 such that

infpsinΣnfψnMfϵn.subscriptinfimum𝑝𝑠subscript𝑖𝑛subscriptΣ𝑛norm𝑓subscript𝜓𝑛subscript𝑀𝑓subscriptitalic-ϵ𝑛\inf_{psi_{n}\in\Sigma_{n}}\|f-\psi_{n}\|\leq M_{f}\epsilon_{n}.roman_inf start_POSTSUBSCRIPT italic_p italic_s italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_f - italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ ≤ italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT .

In Lemma 4.1, it is shown that if 𝗘M𝗔subscript𝗘𝑀𝗔\mathsf{\bm{E}}_{M}\subset\mathsf{\bm{A}}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ⊂ bold_sansserif_A has non-empty interior then there exists a constant λ>0𝜆0\lambda>0italic_λ > 0, such that

log𝒩(𝗔;λϵn)n,n.formulae-sequence𝒩𝗔𝜆subscriptitalic-ϵ𝑛𝑛for-all𝑛\log\mathcal{N}(\mathsf{\bm{A}};\lambda\epsilon_{n})\leq n,\quad\forall\,n\in% \mathbb{N}.roman_log caligraphic_N ( bold_sansserif_A ; italic_λ italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≤ italic_n , ∀ italic_n ∈ blackboard_N .

By assumption on 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A, the left hand side is lower bounded by Cexp(c(λϵn)1/γ)𝐶𝑐superscript𝜆subscriptitalic-ϵ𝑛1𝛾C\exp\left(c(\lambda\epsilon_{n})^{-1/\gamma}\right)italic_C roman_exp ( italic_c ( italic_λ italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 / italic_γ end_POSTSUPERSCRIPT ). Thus, if 𝗘Msubscript𝗘𝑀\mathsf{\bm{E}}_{M}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT has non-empty interior, then we must have

Cexp((λϵn)1/γ)nϵnlog(n)γ,as n.C\exp\left((\lambda\epsilon_{n})^{-1/\gamma}\right)\leq n\quad\Rightarrow\quad% \epsilon_{n}\gtrsim\log(n)^{-\gamma},\;\text{as }n\to\infty.italic_C roman_exp ( ( italic_λ italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 / italic_γ end_POSTSUPERSCRIPT ) ≤ italic_n ⇒ italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≳ roman_log ( italic_n ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT , as italic_n → ∞ .

But by the assumption that ϵn=o(log(n)γ)\epsilon_{n}=o(\log(n)^{-\gamma})italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_o ( roman_log ( italic_n ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ), this last lower bound cannot hold, asymptotically as n𝑛n\to\inftyitalic_n → ∞. Thus, we conclude that 𝗘M𝗔subscript𝗘𝑀𝗔\mathsf{\bm{E}}_{M}\subset\mathsf{\bm{A}}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ⊂ bold_sansserif_A has empty interior for any M>0𝑀0M>0italic_M > 0. We furthermore note that 𝗘Msubscript𝗘𝑀\mathsf{\bm{E}}_{M}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT is closed; indeed, 𝗘Msubscript𝗘𝑀\mathsf{\bm{E}}_{M}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT in (4.2) is given by,

(4.8) 𝗘M=n=1ψnΣnBMϵn(ψn)¯,subscript𝗘𝑀superscriptsubscript𝑛1subscriptsubscript𝜓𝑛subscriptΣ𝑛¯subscript𝐵𝑀subscriptitalic-ϵ𝑛subscript𝜓𝑛\displaystyle\mathsf{\bm{E}}_{M}=\bigcap_{n=1}^{\infty}\bigcup_{\psi_{n}\in% \Sigma_{n}}\overline{B_{M\epsilon_{n}}(\psi_{n})},bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT = ⋂ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ⋃ start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT over¯ start_ARG italic_B start_POSTSUBSCRIPT italic_M italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG ,

where we define the closed balls (in the induced topology on 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A),

BMϵn(ψ)¯:={f𝗔|fψMϵn}𝗔.assign¯subscript𝐵𝑀subscriptitalic-ϵ𝑛𝜓conditional-set𝑓𝗔norm𝑓𝜓𝑀subscriptitalic-ϵ𝑛𝗔\overline{B_{M\epsilon_{n}}(\psi)}:={\left\{f\in\mathsf{\bm{A}}\,\middle|\,\|f% -\psi\|\leq M\epsilon_{n}\right\}}\subset\mathsf{\bm{A}}.over¯ start_ARG italic_B start_POSTSUBSCRIPT italic_M italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_ψ ) end_ARG := { italic_f ∈ bold_sansserif_A | ∥ italic_f - italic_ψ ∥ ≤ italic_M italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ⊂ bold_sansserif_A .

Therefore 𝗘Msubscript𝗘𝑀\mathsf{\bm{E}}_{M}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT can be written as an intersection of a union of closed balls of radius Mϵn𝑀subscriptitalic-ϵ𝑛M\epsilon_{n}italic_M italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT centered at elements ψΣn𝜓subscriptΣ𝑛\psi\in\Sigma_{n}italic_ψ ∈ roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Note that, since the set ΣnsubscriptΣ𝑛\Sigma_{n}roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is finite by assumption, the union of these closed balls,

𝗘M,n:=ψnΣnBMϵn(ψn)¯,assignsubscript𝗘𝑀𝑛subscriptsubscript𝜓𝑛subscriptΣ𝑛¯subscript𝐵𝑀subscriptitalic-ϵ𝑛subscript𝜓𝑛\mathsf{\bm{E}}_{M,n}:=\bigcup_{\psi_{n}\in\Sigma_{n}}\overline{B_{M\epsilon_{% n}}(\psi_{n})},bold_sansserif_E start_POSTSUBSCRIPT italic_M , italic_n end_POSTSUBSCRIPT := ⋃ start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT over¯ start_ARG italic_B start_POSTSUBSCRIPT italic_M italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG ,

is closed for any n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N, implying that also 𝗘M=n=1𝗘M,n𝗔subscript𝗘𝑀superscriptsubscript𝑛1subscript𝗘𝑀𝑛𝗔\mathsf{\bm{E}}_{M}=\bigcap_{n=1}^{\infty}\mathsf{\bm{E}}_{M,n}\subset\mathsf{% \bm{A}}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT = ⋂ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT bold_sansserif_E start_POSTSUBSCRIPT italic_M , italic_n end_POSTSUBSCRIPT ⊂ bold_sansserif_A is closed as an intersection of closed sets.

To conclude the proof, we simply note that 𝗘=M𝗘M𝗘subscript𝑀subscript𝗘𝑀\mathsf{\bm{E}}=\bigcup_{M\in\mathbb{N}}\mathsf{\bm{E}}_{M}bold_sansserif_E = ⋃ start_POSTSUBSCRIPT italic_M ∈ blackboard_N end_POSTSUBSCRIPT bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT can be written as a countable union, for integer M𝑀M\in\mathbb{N}italic_M ∈ blackboard_N, of closed subsets with empty interior 𝗘Msubscript𝗘𝑀\mathsf{\bm{E}}_{M}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT. In particular, this implies that 𝗘𝗘\mathsf{\bm{E}}bold_sansserif_E is itself meagre by the Baire category theorem. We conclude that the complement 𝗥:=𝗔𝗘assign𝗥𝗔𝗘\mathsf{\bm{R}}:=\mathsf{\bm{A}}\setminus\mathsf{\bm{E}}bold_sansserif_R := bold_sansserif_A ∖ bold_sansserif_E, consisting of all f𝗔𝑓𝗔f\in\mathsf{\bm{A}}italic_f ∈ bold_sansserif_A for which

infψnΣnfψnO(ϵn),subscriptinfimumsubscript𝜓𝑛subscriptΣ𝑛norm𝑓subscript𝜓𝑛𝑂subscriptitalic-ϵ𝑛\inf_{\psi_{n}\in\Sigma_{n}}\|f-\psi_{n}\|\not=O(\epsilon_{n}),roman_inf start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_f - italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ ≠ italic_O ( italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ,

is residual. This completes the proof. ∎

A similar result can also be derived under the assumption of an algebraic scaling. This may be of relevance for generic function approximation by neural networks, and hence we mention it here, in passing.

Proposition 4.3 (Algebraic scaling).

Let 𝗩𝗩\mathsf{\bm{V}}bold_sansserif_V be a Banach space. Let 𝗔𝗩𝗔𝗩\mathsf{\bm{A}}\subset\mathsf{\bm{V}}bold_sansserif_A ⊂ bold_sansserif_V be a compact, convex subset. Assume that there exist constants C,γ>0𝐶𝛾0C,\gamma>0italic_C , italic_γ > 0 such that,

(4.9) log𝒩(𝗔;ϵ)Cϵ1/γ,ϵ>0.formulae-sequence𝒩𝗔italic-ϵ𝐶superscriptitalic-ϵ1𝛾for-allitalic-ϵ0\displaystyle\log\mathcal{N}(\mathsf{\bm{A}};\epsilon)\geq C\epsilon^{-1/% \gamma},\quad\forall\,\epsilon>0.roman_log caligraphic_N ( bold_sansserif_A ; italic_ϵ ) ≥ italic_C italic_ϵ start_POSTSUPERSCRIPT - 1 / italic_γ end_POSTSUPERSCRIPT , ∀ italic_ϵ > 0 .

Let {Σn}nsubscriptsubscriptΣ𝑛𝑛\{\Sigma_{n}\}_{n\in\mathbb{N}}{ roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT be a family of subsets Σn𝗩subscriptΣ𝑛𝗩\Sigma_{n}\subset\mathsf{\bm{V}}roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⊂ bold_sansserif_V with |Σn|2nsubscriptΣ𝑛superscript2𝑛|\Sigma_{n}|\leq 2^{n}| roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | ≤ 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT elements. Then generic elements f𝗔𝑓𝗔f\in\mathsf{\bm{A}}italic_f ∈ bold_sansserif_A cannot be approximated by elements of ΣnsubscriptΣ𝑛\Sigma_{n}roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT at convergence rate better than nγsuperscript𝑛𝛾n^{-\gamma}italic_n start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT; more precisely, for any sequence ϵn=o(nγ)subscriptitalic-ϵ𝑛𝑜superscript𝑛𝛾\epsilon_{n}=o(n^{-\gamma})italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_o ( italic_n start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ), the subset 𝗥𝗔𝗥𝗔\mathsf{\bm{R}}\subset\mathsf{\bm{A}}bold_sansserif_R ⊂ bold_sansserif_A, such that,

(4.10) infψnΣnfψnO(ϵn),f𝗥,formulae-sequencesubscriptinfimumsubscript𝜓𝑛subscriptΣ𝑛norm𝑓subscript𝜓𝑛𝑂subscriptitalic-ϵ𝑛for-all𝑓𝗥\displaystyle\inf_{\psi_{n}\in\Sigma_{n}}\|f-\psi_{n}\|\not=O(\epsilon_{n}),% \quad\forall\,f\in\mathsf{\bm{R}},roman_inf start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_f - italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ ≠ italic_O ( italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , ∀ italic_f ∈ bold_sansserif_R ,

is residual.

Proof.

Let 𝗥:=𝗔𝗘assign𝗥𝗔𝗘\mathsf{\bm{R}}:=\mathsf{\bm{A}}\setminus\mathsf{\bm{E}}bold_sansserif_R := bold_sansserif_A ∖ bold_sansserif_E, where 𝗘𝗘\mathsf{\bm{E}}bold_sansserif_E is defined by (4.3). Recall that 𝗘𝗘\mathsf{\bm{E}}bold_sansserif_E is precisely the set of f𝗔𝑓𝗔f\in\mathsf{\bm{A}}italic_f ∈ bold_sansserif_A for which there exists Mf>0subscript𝑀𝑓0M_{f}>0italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT > 0 such that

infpsinΣnfψnMfϵn.subscriptinfimum𝑝𝑠subscript𝑖𝑛subscriptΣ𝑛norm𝑓subscript𝜓𝑛subscript𝑀𝑓subscriptitalic-ϵ𝑛\inf_{psi_{n}\in\Sigma_{n}}\|f-\psi_{n}\|\leq M_{f}\epsilon_{n}.roman_inf start_POSTSUBSCRIPT italic_p italic_s italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ roman_Σ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ italic_f - italic_ψ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ ≤ italic_M start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT .

In Lemma 4.1, it is shown that if 𝗘M𝗔subscript𝗘𝑀𝗔\mathsf{\bm{E}}_{M}\subset\mathsf{\bm{A}}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ⊂ bold_sansserif_A has non-empty interior then there exists a constant λ>0𝜆0\lambda>0italic_λ > 0, such that

log𝒩(𝗔;λϵn)n,n.formulae-sequence𝒩𝗔𝜆subscriptitalic-ϵ𝑛𝑛for-all𝑛\log\mathcal{N}(\mathsf{\bm{A}};\lambda\epsilon_{n})\leq n,\quad\forall\,n\in% \mathbb{N}.roman_log caligraphic_N ( bold_sansserif_A ; italic_λ italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≤ italic_n , ∀ italic_n ∈ blackboard_N .

By assumption on 𝗔𝗔\mathsf{\bm{A}}bold_sansserif_A, the left hand side is lower bounded by C(λϵn)1/γ𝐶superscript𝜆subscriptitalic-ϵ𝑛1𝛾C(\lambda\epsilon_{n})^{-1/\gamma}italic_C ( italic_λ italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 / italic_γ end_POSTSUPERSCRIPT. Thus, if 𝗘Msubscript𝗘𝑀\mathsf{\bm{E}}_{M}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT has non-empty interior, then we must have

C(λϵn)1/γnϵnnγ,as n.formulae-sequence𝐶superscript𝜆subscriptitalic-ϵ𝑛1𝛾𝑛formulae-sequencegreater-than-or-equivalent-tosubscriptitalic-ϵ𝑛superscript𝑛𝛾as 𝑛C(\lambda\epsilon_{n})^{-1/\gamma}\leq n\quad\Rightarrow\quad\epsilon_{n}% \gtrsim n^{-\gamma},\;\text{as }n\to\infty.italic_C ( italic_λ italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 / italic_γ end_POSTSUPERSCRIPT ≤ italic_n ⇒ italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≳ italic_n start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT , as italic_n → ∞ .

By assumption, ϵn=o(nγ)subscriptitalic-ϵ𝑛𝑜superscript𝑛𝛾\epsilon_{n}=o(n^{-\gamma})italic_ϵ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_o ( italic_n start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ), this is not the case. Thus, we conclude that 𝗘M𝗔subscript𝗘𝑀𝗔\mathsf{\bm{E}}_{M}\subset\mathsf{\bm{A}}bold_sansserif_E start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ⊂ bold_sansserif_A has empty interior for any M>0𝑀0M>0italic_M > 0. Thus, arguing as in the proof of Proposition 4.3 it follows that 𝗘𝗘\mathsf{\bm{E}}bold_sansserif_E is meagre, and hence 𝗥=𝗔𝗘𝗥𝗔𝗘\mathsf{\bm{R}}=\mathsf{\bm{A}}\setminus\mathsf{\bm{E}}bold_sansserif_R = bold_sansserif_A ∖ bold_sansserif_E is residual. ∎

5. Conclusion

Operator learning is a new paradigm for the data-driven approximation of operators. Popular operator learning frameworks extend and generalize neural networks to this infinite-dimensional setting. While there are numerous papers demonstrating the potential and practical utility of proposed neural operator architectures, our understanding of the precise conditions under which operator learning is practically feasible remains limited.

This paper makes a contribution to the mathematical underpinnings of this field, by providing an information-theoretic perspective on the curse of parametric complexity (a scaling-limit of the curse of dimensionality) identified in [34]. In particular, it is shown that this curse poses a fundamental limitation to operator learning on general spaces of Lipschitz operators. Bit-encoding (storing in memory) any neural operator architecture, which is capable of achieving approximation accuracy ϵitalic-ϵ\epsilonitalic_ϵ for general 1111-Lipschitz continuous and real-valued operators, requires a number of bits that is exponential in ϵ1superscriptitalic-ϵ1\epsilon^{-1}italic_ϵ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. It is shown that this is true not only when measuring the approximation error in the sup-norm over compact sets of input functions, but also when measuring the error in the Lp(μ)superscript𝐿𝑝𝜇L^{p}(\mu)italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_μ )-norm with respect to a probability measure satisfying certain structural assumptions. The assumptions are met for widely considered μ𝜇\muitalic_μ, including the case of a Gaussian random field with at most algebraically decreasing eigenvalues of the covariance. These results rely on minimax analysis and, in contrast to prior work [34], are independent of the employed activation function in the architecture.

Going beyond such minimax analysis, we furthermore study the approximation of individual Lipschitz operators by a sequence of neural operator architectures. Such a sequence would e.g. be obtained when increasing the width, depth or other hyperparameters at a pre-defined rate as the model is scaled up. In this setting, we address the following question: “At which rate can the approximation error along such a sequence decrease, as a function of the total number of bit-encoded parameters?” Using topological arguments based on Baire category, we establish a quantitative relation between the metric entropy of the set of 1111-Lipschitz operators, and the best approximation-rate that can be achieved along such a sequence for generic 1111-Lipschitz operators; as a consequence of the exponential increase in metric ϵitalic-ϵ\epsilonitalic_ϵ-entropy of the set of 1111-Lipschitz operators, it is shown that achievable approximation rates are at most logarithmic as a function of the required encoding bits.

Finally, this abstract analysis leads to a concrete result on the approximation of generic Lipschitz operators by Fourier neural operator. Our results imply that for generic 1111-Lipschitz operators, and under mild assumptions on the tunable parameters, there cannot exist a sequence of FNO approximations which approximates the underlying operator at a rate that decays faster than logarithmic in the number of real-valued parameters. To obtain this result, mild bounds on the growth of the parameters of FNO approximants are assumed; specifically, the size of individual parameter is assumed to be exponentially bounded by the total number of parameters, as the model size is scaled up.

The results of this work should be compared and contrasted with the recent work [50], which shows the surprising result that there exist (non-standard) neural operator architectures capable of approximating Lipschitz continuous operators to accuracy ϵitalic-ϵ\epsilonitalic_ϵ, with a number of real-valued tunable parameters q𝑞qitalic_q growing only algebraically with ϵ1superscriptitalic-ϵ1\epsilon^{-1}italic_ϵ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. The analysis of the present work indicates that a practical implementation of such architectures on computing hardware, and with parameters encoded by a total of B𝐵Bitalic_B bits will require B𝐵Bitalic_B to be exponentially large in ϵ1superscriptitalic-ϵ1\epsilon^{-1}italic_ϵ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. In fact, if each parameter is encoded by b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bits, then a lower bound of the following form is to be expected:

qb1Cexp(cϵγ),𝑞subscript𝑏1𝐶𝑐superscriptitalic-ϵ𝛾qb_{1}\geq C\exp(c\epsilon^{-\gamma}),italic_q italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ italic_C roman_exp ( italic_c italic_ϵ start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ) ,

for fixed constants C,c,γ>0𝐶𝑐𝛾0C,c,\gamma>0italic_C , italic_c , italic_γ > 0 independent of ϵitalic-ϵ\epsilonitalic_ϵ. In particular, if qϵλless-than-or-similar-to𝑞superscriptitalic-ϵ𝜆q\lesssim\epsilon^{-\lambda}italic_q ≲ italic_ϵ start_POSTSUPERSCRIPT - italic_λ end_POSTSUPERSCRIPT grows at most algebraically, as in the construction [50], then the number of encoding bits q1subscript𝑞1q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT per parameter must necessarily grow exponentially. Thus, the only trade-off that appears possible from an information-theoretic perspective is to reduce the number of parameters q𝑞qitalic_q at the expense of the required number of bits per parameter b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, or vice versa. In turn, the required number of encoding bits is intimately linked to the stability of the map** θΦ(;θ)maps-to𝜃Φ𝜃\theta\mapsto\Phi({\,\cdot\,};\theta)italic_θ ↦ roman_Φ ( ⋅ ; italic_θ ) from parameters θ𝜃\thetaitalic_θ to the corresponding realization of the neural operator Φ(;θ)Φ𝜃\Phi({\,\cdot\,};\theta)roman_Φ ( ⋅ ; italic_θ ); an exponentially growing number of bits b1subscript𝑏1b_{1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is only required if the parameter-to-realization map** is either very unstable, e.g. having very large Lipschitz constant, or if the optimal parameters themselves are very large. Here, “large” means that either the Lipschitz constant or the superscript\ell^{\infty}roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT-norm of the parameters grows exponentially with ϵ1superscriptitalic-ϵ1\epsilon^{-1}italic_ϵ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

The results of this work underline the fundamental character of the curse of parametric complexity identified in [34] from the point of view of information theory. In addition, it is here shown that this curse persists even when the sup-norm (uniform approximation of the underlying operator) is replaced by an a priori much weaker Lpsuperscript𝐿𝑝L^{p}italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT-norm (approximation in expectation). This considerably constrains the generality with which approximation theory for operator learning, guaranteeing efficient approximation by neural operators at algebraic convergence rates, can be developed. A complete or partial characterization of the relevant mathematical properties and structures enabling efficient operator approximation, would be highly desirable. The results presented in this work demonstrate rigorously that one has to go beyond Lipschitz operators to achieve this.

Acknowledgments

The author would like to thank Andrew M. Stuart and Nikola B. Kovachki for interesting discussions which have led to this work. This work has been supported by funding from the Swiss National Science Foundation through Postdoc.Mobility grant P500PT-206737.

References

  • [1] E. M. Achour, A. Foucault, S. Gerchinovitz, and F. Malgouyres. A general approximation lower bound in Lpsuperscript𝐿𝑝L^{p}italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT norm, with applications to feed-forward neural networks. In Advances in Neural Information Processing Systems, 2022.
  • [2] B. Adcock, S. Brugiapaglia, N. Dexter, and S. Moraga. On efficient algorithms for computing near-best polynomial approximations to high-dimensional, Hilbert-valued functions from limited samples. arXiv preprint arXiv:2203.13908, 2022.
  • [3] B. Adcock, N. Dexter, and S. Moraga. Optimal approximation of infinite-dimensional holomorphic functions. arXiv preprint arXiv:2305.18642, 2023.
  • [4] B. Adcock, N. Dexter, and S. Moraga. Optimal approximation of infinite-dimensional holomorphic functions ii: recovery from iid pointwise samples. arXiv preprint arXiv:2310.16940, 2023.
  • [5] J. A. L. Benitez, T. Furuya, F. Faucher, A. Kratsios, X. Tricoche, and M. V. de Hoop. Out-of-distributional risk bounds for neural operators with applications to the Helmholtz equation. arXiv preprint arXiv:2301.11509, 2023.
  • [6] K. Bhattacharya, B. Hosseini, N. B. Kovachki, and A. M. Stuart. Model reduction and neural networks for parametric PDEs. The SMAI Journal of Computational Mathematics, 7:121–157, 2021.
  • [7] M. S. Birman and M. Z. Solomyak. Approximation of functions of the w_p^α𝛼\alphaitalic_α-classes by piece-wise-polynomial functions. Doklady Akademii Nauk, 171(5):1015–1018, 1966.
  • [8] H. Bölcskei, P. Grohs, G. Kutyniok, and P. Petersen. Memory-optimal neural network approximation. In Wavelets and Sparsity XVII, volume 10394, pages 157–168. SPIE, 2017.
  • [9] H. Bolcskei, P. Grohs, G. Kutyniok, and P. Petersen. Optimal approximation with sparsely connected deep neural networks. SIAM Journal on Mathematics of Data Science, 1(1):8–45, 2019.
  • [10] J. Castro. The Kolmogorov infinite dimensional equation in a Hilbert space via deep learning methods. Journal of Mathematical Analysis and Applications, 527(2):127413, 2023.
  • [11] J. Castro, C. Muñoz, and N. Valenzuela. The Calderón’s problem via DeepONets. arXiv preprint arXiv:2212.08941, 2022.
  • [12] T. Chen and H. Chen. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Transactions on Neural Networks, 6(4):911–917, 1995.
  • [13] A. Cohen, R. DeVore, G. Petrova, and P. Wojtaszczyk. Optimal stable nonlinear approximation. Foundations of Computational Mathematics, 22(3):607–648, 2022.
  • [14] A. Cohen, R. DeVore, and C. Schwab. Convergence rates of best n-term galerkin approximations for a class of elliptic SPDEs. Foundations of Computational Mathematics, 10(6):615–646, 2010.
  • [15] A. Cohen, R. Devore, and C. Schwab. Analytic regularity and polynomial approximation of parametric and stochastic elliptic PDEs. Analysis and Applications, 9(01):11–47, 2011.
  • [16] S. Dahlke, F. De Mari, P. Grohs, and D. Labate. Harmonic and applied analysis. Appl. Numer. Harmon. Anal, 2015.
  • [17] B. Deng, Y. Shin, L. Lu, Z. Zhang, and G. E. Karniadakis. Convergence rate of DeepONets for learning operators arising from advection-diffusion equations. arXiv preprint arXiv:2102.10621, 2021.
  • [18] R. DeVore, B. Hanin, and G. Petrova. Neural network approximation. Acta Numerica, 30:327–444, 2021.
  • [19] D. Elbrächter, D. Perekrestenko, P. Grohs, and H. Bölcskei. Deep neural network approximation theory. IEEE Transactions on Information Theory, 67(5):2581–2623, 2021.
  • [20] N. R. Franco, S. Fresca, A. Manzoni, and P. Zunino. Approximation bounds for convolutional neural networks in operator learning. Neural Networks, 161:129–141, 2023.
  • [21] L. Galimberti, A. Kratsios, and G. Livieri. Designing universal causal deep learning models: The case of infinite-dimensional dynamical systems from stochastic analysis. arXiv preprint arXiv:2210.13300, 2022.
  • [22] L. Herrmann, C. Schwab, and J. Zech. Neural and GPC operator surrogates: Construction and expression rate bounds. arXiv preprint arXiv:2207.04950, 2022.
  • [23] N. Hua and W. Lu. Basis operator network: A neural network-based model for learning nonlinear operators via neural basis. Neural Networks, 164:21–37, 2023.
  • [24] D. Z. Huang, N. H. Nelsen, and M. Trautner. An operator learning perspective on parameter-to-observable maps. arXiv preprint arXiv:2402.06031, 2024.
  • [25] P. **, S. Meng, and L. Lu. Mionet: Learning multiple-input operators via tensor product. SIAM Journal on Scientific Computing, 44(6):A3490–A3514, 2022.
  • [26] A. N. Kolmogorov and V. M. Tikhomirov. ϵitalic-ϵ\epsilonitalic_ϵ-entropy and ϵitalic-ϵ\epsilonitalic_ϵ-capacity of sets in functional spaces. Amer. Math. Soc. Transl. Ser. 2, 17, 1961.
  • [27] Y. Korolev. Two-layer neural networks with values in a banach space. SIAM Journal on Mathematical Analysis, 54(6):6358–6389, 2022.
  • [28] N. B. Kovachki, S. Lanthaler, and H. Mhaskar. Data complexity estimates for operator learning, 2024.
  • [29] N. B. Kovachki, S. Lanthaler, and S. Mishra. On universal approximation and error bounds for Fourier neural operators. Journal of Machine Learning Research, 22(1), 2021.
  • [30] N. B. Kovachki, S. Lanthaler, and A. M. Stuart. Operator learning: Algorithms and analysis. In Numerical Analysis meets Machine Learning, Handbook of Numerical Analysis. Elsevier, 2024.
  • [31] N. B. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, and A. Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs. Journal of Machine Learning Research, 24(89), 2023.
  • [32] A. Kratsios, T. Furuya, J. A. L. Benitez, M. Lassas, and M. de Hoop. Mixture of experts soften the curse of dimensionality in operator learning, 2024.
  • [33] G. Kutyniok, P. Petersen, M. Raslan, and R. Schneider. A theoretical analysis of deep neural networks and parametric PDEs. Constructive Approximation, 55(1):73–125, 2022.
  • [34] S. Lanthaler. Operator learning with PCA-Net: Upper and lower complexity bounds. Journal of Machine Learning Research, 24(318), 2023.
  • [35] S. Lanthaler, Z. Li, and A. M. Stuart. The nonlocal neural operator: Universal approximation. arXiv preprint arXiv:2304.13221, 2023.
  • [36] S. Lanthaler, S. Mishra, and G. E. Karniadakis. Error estimates for DeepONets: A deep learning framework in infinite dimensions. Transactions of Mathematics and Its Applications, 6(1), 2022.
  • [37] S. Lanthaler and N. H. Nelsen. Error bounds for learning with vector-valued random features. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • [38] Z. Lei, L. Shi, and C. Zeng. Solving parametric partial differential equations with deep rectified quadratic unit neural networks. Journal of Scientific Computing, 93(3):80, 2022.
  • [39] Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. M. Stuart, and A. Anandkumar. Fourier neural operator for parametric partial differential equations. In Ninth International Conference on Learning Representations, 2021.
  • [40] H. Liu, H. Yang, M. Chen, T. Zhao, and W. Liao. Deep nonparametric estimation of operators between infinite dimensional spaces. Journal of Machine Learning Research, 25(24):1–67, 2024.
  • [41] L. Lu, P. **, G. Pang, Z. Zhang, and G. E. Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, 2021.
  • [42] V. Maiorov and A. Pinkus. Lower bounds for approximation by mlp neural networks. Neurocomputing, 25(1):81–91, 1999.
  • [43] C. Marcati and C. Schwab. Exponential convergence of deep operator networks for elliptic partial differential equations. SIAM Journal on Numerical Analysis, 61(3):1513–1545, 2023.
  • [44] H. N. Mhaskar and N. Hahm. Neural networks for functional approximation and system identification. Neural Computation, 9(1):143–159, 1997.
  • [45] J. Munkres. Topology. Pearsonn Education Limited, 2 edition, 2014.
  • [46] N. H. Nelsen and A. M. Stuart. The random feature model for input-output maps between Banach spaces. SIAM Journal on Scientific Computing, 43(5):A3212–A3243, 2021.
  • [47] J. A. Opschoor, C. Schwab, and J. Zech. Exponential ReLU DNN expression of holomorphic maps in high dimension. Constructive Approximation, 55(1):537–582, 2022.
  • [48] D. Patel, D. Ray, M. R. Abdelmalik, T. J. Hughes, and A. A. Oberai. Variationally mimetic operator networks. Computer Methods in Applied Mechanics and Engineering, 419:116536, 2024.
  • [49] P. Petersen and F. Voigtlaender. Optimal approximation of piecewise smooth functions using deep relu neural networks. Neural Networks, 108:296–330, 2018.
  • [50] C. Schwab, A. Stein, and J. Zech. Deep operator network approximation rates for Lipschitz operators. arXiv preprint arXiv:2307.09835, 2023.
  • [51] C. Schwab and J. Zech. Deep learning in high dimension: Neural network expression rates for generalized polynomial chaos expansions in UQ. Analysis and Applications, 17(01):19–55, 2019.
  • [52] C. Schwab and J. Zech. Deep learning in high dimension: Neural network expression rates for analytic functions in L2(d,γd)superscript𝐿2superscript𝑑subscript𝛾𝑑{L}^{2}(\mathbb{R}^{d},\gamma_{d})italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , italic_γ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). SIAM/ASA Journal on Uncertainty Quantification, 11(1):199–234, 2023.
  • [53] F. Voigtlaender and P. Petersen. Approximation in l p (μ𝜇\muitalic_μ) with deep relu neural networks. In 2019 13th International conference on Sampling Theory and Applications (SampTA), pages 1–4. IEEE, 2019.
  • [54] Z. Zhang, L. Tat, and H. Schaeffer. BelNet: Basis enhanced learning, a mesh-free neural operator. Proceedings of the Royal Society A, 479, 2023.

Appendix A A short summary of Baire category

In this appendix, we recall the Baire category theorem from general topology. For a more thorough discussion of this result, and its connections to other topological concepts, we refer to the textbook [45, Chap. 8].

Let X𝑋Xitalic_X be a topological space. Let AX𝐴𝑋A\subset Xitalic_A ⊂ italic_X be a subset. We recall that the interior of A𝐴Aitalic_A is defined as the union of all open sets of X𝑋Xitalic_X that are contained in A𝐴Aitalic_A. The set A𝐴Aitalic_A is said to have empty interior if A𝐴Aitalic_A contains no open set of X𝑋Xitalic_X other than the empty set. Equivalently, A𝐴Aitalic_A is said to have empty interior if the complement of A𝐴Aitalic_A is dense in X𝑋Xitalic_X. We then have the following definition [45, Chap. 8, p. 293]:

Definition A.1.

A space X𝑋Xitalic_X is said to be a Baire space if the following condition holds: Given any countable collection {An}subscript𝐴𝑛\{A_{n}\}{ italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } of closed sets of X𝑋Xitalic_X each of which has empty interior in X𝑋Xitalic_X, their union nAnsubscript𝑛subscript𝐴𝑛\bigcup_{n}A_{n}⋃ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT also has empty interior in X𝑋Xitalic_X.

This definition can equivalently be stated in terms of open sets [45, Lemma 48.1]:

Lemma A.2.

X𝑋Xitalic_X is a Baire space if and only if given any countable collection {Un}subscript𝑈𝑛\{U_{n}\}{ italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } of open sets in X𝑋Xitalic_X, each of which is dense in X𝑋Xitalic_X, their intersection nUnsubscript𝑛subscript𝑈𝑛\bigcap_{n}U_{n}⋂ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is also dense in X𝑋Xitalic_X.

The following Baire category theorem [45, Thm. 48.2] exposes many examples of Baire spaces encountered in applications:

Theorem A.3 (Baire category theorem).

If X𝑋Xitalic_X is a compact Hausdorff space or a complete metric space, then X𝑋Xitalic_X is a Baire space.

Appendix B Proof of the quantization lemma

The goal of this appendix is to prove the FNO quantization lemma 2.21:

See 2.21

Proof of Lemma 2.21.

Let ΦqsubscriptΦ𝑞\Phi_{q}roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT be an output-averaged FNO with at most q𝑞qitalic_q tunable parameters. We first note that the depth of ΦqsubscriptΦ𝑞\Phi_{q}roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT can only take the values L{1,,q}𝐿1𝑞L\in\{1,\dots,q\}italic_L ∈ { 1 , … , italic_q }. For each possible value of the depth, we now consider the maximally connected output-averaged FNO architecture Φ^q(L)superscriptsubscript^Φ𝑞𝐿\widehat{\Phi}_{q}^{(L)}over^ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT of depth L𝐿Litalic_L, obtained by setting κ,dc=q𝜅subscript𝑑𝑐𝑞\kappa,{d_{c}}=qitalic_κ , italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = italic_q in each layer. This maximally connected FNO architecture has at most

q^(L)5(2κ)dLdc252dqd+3,superscript^𝑞𝐿5superscript2𝜅𝑑𝐿superscriptsubscript𝑑𝑐25superscript2𝑑superscript𝑞𝑑3\widehat{q}^{(L)}\leq 5(2\kappa)^{d}L{d_{c}}^{2}\leq 5\cdot 2^{d}q^{d+3},over^ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ≤ 5 ( 2 italic_κ ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_L italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 5 ⋅ 2 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_q start_POSTSUPERSCRIPT italic_d + 3 end_POSTSUPERSCRIPT ,

tunable parameters. For later reference, we note that

Observation 1: Any output-averaged averaged FNO Φq(;θ)subscriptΦ𝑞𝜃\Phi_{q}({\,\cdot\,};\theta)roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ ) with depth L𝐿Litalic_L and at most q𝑞qitalic_q parameters can be represented by a specific choice of the weights of Φ^q(L)(;θ^)superscriptsubscript^Φ𝑞𝐿^𝜃\widehat{\Phi}_{q}^{(L)}({\,\cdot\,};\widehat{\theta})over^ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ( ⋅ ; over^ start_ARG italic_θ end_ARG ). In fact, this only requires zero-padding θ𝜃\thetaitalic_θ to obtain θ^^𝜃\widehat{\theta}over^ start_ARG italic_θ end_ARG.

Our main goal is to suitably quantize Φ^q(L)superscriptsubscript^Φ𝑞𝐿\widehat{\Phi}_{q}^{(L)}over^ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT, and then define a quantized neural operator architecture Φ~nqsubscript~Φsubscript𝑛𝑞\widetilde{\Phi}_{n_{q}}over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT with nqsubscript𝑛𝑞n_{q}italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT bits which can represent all quantized Φ^q(L)superscriptsubscript^Φ𝑞𝐿\widehat{\Phi}_{q}^{(L)}over^ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT for L=1,,q𝐿1𝑞L=1,\dots,qitalic_L = 1 , … , italic_q by specific setting of its bitwise-encoded parameters.

It follows from [28, Proposition D.15], with a minimal extension to allow for σ(0)0𝜎00\sigma(0)\neq 0italic_σ ( 0 ) ≠ 0, that the Lipschitz constant of the map**,

Rq(L):{q^C(𝒦),θΦ^q(L)(;θ),R_{q}^{(L)}:\;\left\{\begin{aligned} {}^{\widehat{q}}&\to C(\mathcal{K}),\\ \theta&\mapsto\widehat{\Phi}_{q}^{(L)}({\,\cdot\,};\theta),\end{aligned}\right.italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT : { start_ROW start_CELL start_FLOATSUPERSCRIPT over^ start_ARG italic_q end_ARG end_FLOATSUPERSCRIPT end_CELL start_CELL → italic_C ( caligraphic_K ) , end_CELL end_ROW start_ROW start_CELL italic_θ end_CELL start_CELL ↦ over^ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ( ⋅ ; italic_θ ) , end_CELL end_ROW

and with [Mq,Mq]q^superscriptsubscript𝑀𝑞subscript𝑀𝑞^𝑞[-M_{q},M_{q}]^{\widehat{q}}[ - italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT over^ start_ARG italic_q end_ARG end_POSTSUPERSCRIPT metrized by the superscript\ell^{\infty}roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT-norm, can be bounded by

Lip(Rq(L))(L+2)(2dcMq)L+2(C+(2κ)d/2).Lipsuperscriptsubscript𝑅𝑞𝐿𝐿2superscript2subscript𝑑𝑐subscript𝑀𝑞𝐿2𝐶superscript2𝜅𝑑2\mathrm{Lip}(R_{q}^{(L)})\leq(L+2)(2{d_{c}}M_{q})^{L+2}\left(C+(2\kappa)^{d/2}% \right).roman_Lip ( italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ≤ ( italic_L + 2 ) ( 2 italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_L + 2 end_POSTSUPERSCRIPT ( italic_C + ( 2 italic_κ ) start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT ) .

Here, C>0𝐶0C>0italic_C > 0 is a constant depending only on d𝑑ditalic_d and 𝒦𝒦\mathcal{K}caligraphic_K. In particular, there exists a (larger) constant C=C(d,𝒦)𝐶𝐶𝑑𝒦C=C(d,\mathcal{K})italic_C = italic_C ( italic_d , caligraphic_K ), such that

Lip(Rq(L))(Cq)Cq=exp(Cqlog(Cq)).Lipsuperscriptsubscript𝑅𝑞𝐿superscript𝐶𝑞𝐶𝑞𝐶𝑞𝐶𝑞\mathrm{Lip}(R_{q}^{(L)})\leq(Cq)^{Cq}=\exp(Cq\log(Cq)).roman_Lip ( italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ≤ ( italic_C italic_q ) start_POSTSUPERSCRIPT italic_C italic_q end_POSTSUPERSCRIPT = roman_exp ( italic_C italic_q roman_log ( italic_C italic_q ) ) .

We quantize Φ^q(L)superscriptsubscript^Φ𝑞𝐿\widehat{\Phi}_{q}^{(L)}over^ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT for θ[Mq,Mq]q^𝜃superscriptsubscript𝑀𝑞subscript𝑀𝑞^𝑞\theta\in[-M_{q},M_{q}]^{\widehat{q}}italic_θ ∈ [ - italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT over^ start_ARG italic_q end_ARG end_POSTSUPERSCRIPT by subdividing each coordinate direction by equidistant points of separation log(q)γ/exp(Cqlog(Cq))\sim\log(q)^{-\gamma}/\exp(Cq\log(Cq))∼ roman_log ( italic_q ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT / roman_exp ( italic_C italic_q roman_log ( italic_C italic_q ) ). Denote the resulting discrete set of points by Θq(L)q^subscriptsuperscriptΘ𝐿𝑞superscript^𝑞\Theta^{(L)}_{q}\subset\mathbb{R}^{\widehat{q}}roman_Θ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ⊂ blackboard_R start_POSTSUPERSCRIPT over^ start_ARG italic_q end_ARG end_POSTSUPERSCRIPT. We note that this subdivision requires at most,

O({Mqlog(q)γexp(Cqlog(Cq))}q^)O\left(\,\big{\{}M_{q}\log(q)^{\gamma}\exp(Cq\log(Cq))\big{\}}^{\widehat{q}}\,\right)italic_O ( { italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT roman_log ( italic_q ) start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT roman_exp ( italic_C italic_q roman_log ( italic_C italic_q ) ) } start_POSTSUPERSCRIPT over^ start_ARG italic_q end_ARG end_POSTSUPERSCRIPT )

many quantization points, which can be encoded by

O(q^log(Mqlog(q)γexp(Cqlog(Cq))))O\Big{(}\,\widehat{q}\,\log\big{(}\,M_{q}\log(q)^{\gamma}\exp(Cq\log(Cq))\,% \big{)}\,\Big{)}italic_O ( over^ start_ARG italic_q end_ARG roman_log ( italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT roman_log ( italic_q ) start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT roman_exp ( italic_C italic_q roman_log ( italic_C italic_q ) ) ) )

many bits. Since q^=O(qd+3)^𝑞𝑂superscript𝑞𝑑3\widehat{q}=O(q^{d+3})over^ start_ARG italic_q end_ARG = italic_O ( italic_q start_POSTSUPERSCRIPT italic_d + 3 end_POSTSUPERSCRIPT ), log(Mqlog(q)γ)=O(q)\log(M_{q}\log(q)^{\gamma})=O(q)roman_log ( italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT roman_log ( italic_q ) start_POSTSUPERSCRIPT italic_γ end_POSTSUPERSCRIPT ) = italic_O ( italic_q ) and log(exp(Cqlog(Cq)))=O(q2)𝐶𝑞𝐶𝑞𝑂superscript𝑞2\log(\exp(Cq\log(Cq)))=O(q^{2})roman_log ( roman_exp ( italic_C italic_q roman_log ( italic_C italic_q ) ) ) = italic_O ( italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), it follows that the number of required bits is

O(qd+6),𝑂superscript𝑞𝑑6O\left(q^{d+6}\right),italic_O ( italic_q start_POSTSUPERSCRIPT italic_d + 6 end_POSTSUPERSCRIPT ) ,

i.e. log2|Θq(L)|=O(qd+6)subscript2subscriptsuperscriptΘ𝐿𝑞𝑂superscript𝑞𝑑6\log_{2}|\Theta^{(L)}_{q}|=O(q^{d+6})roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | roman_Θ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | = italic_O ( italic_q start_POSTSUPERSCRIPT italic_d + 6 end_POSTSUPERSCRIPT ). The implied constant here is independent of L𝐿Litalic_L. In the following, we denote m:=d+6assign𝑚𝑑6m:=d+6italic_m := italic_d + 6. In particular, we conclude that there exists a constant C>0𝐶0C>0italic_C > 0, independent of q𝑞qitalic_q, such that

maxL=1,,q|Θq(L)|Cqm.subscript𝐿1𝑞subscriptsuperscriptΘ𝐿𝑞𝐶superscript𝑞𝑚\max_{L=1,\dots,q}|\Theta^{(L)}_{q}|\leq Cq^{m}.roman_max start_POSTSUBSCRIPT italic_L = 1 , … , italic_q end_POSTSUBSCRIPT | roman_Θ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | ≤ italic_C italic_q start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT .

We also note that, by construction, for any θ[Mq,Mq]q^𝜃superscriptsubscript𝑀𝑞subscript𝑀𝑞^𝑞\theta\in[-M_{q},M_{q}]^{\widehat{q}}italic_θ ∈ [ - italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT over^ start_ARG italic_q end_ARG end_POSTSUPERSCRIPT, there exists θΘq(L)superscript𝜃subscriptsuperscriptΘ𝐿𝑞\theta^{\prime}\in\Theta^{(L)}_{q}italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Θ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, such that

θθlog(q)γexp(Cqlog(Cq)).\|\theta-\theta^{\prime}\|_{\ell^{\infty}}\leq\frac{\log(q)^{-\gamma}}{\exp(Cq% \log(Cq))}.∥ italic_θ - italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ divide start_ARG roman_log ( italic_q ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT end_ARG start_ARG roman_exp ( italic_C italic_q roman_log ( italic_C italic_q ) ) end_ARG .

It follows that for any θ[Mq,Mq]q^𝜃superscriptsubscript𝑀𝑞subscript𝑀𝑞^𝑞\theta\in[-M_{q},M_{q}]^{\widehat{q}}italic_θ ∈ [ - italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT over^ start_ARG italic_q end_ARG end_POSTSUPERSCRIPT, there exists θΘq(L)superscript𝜃subscriptsuperscriptΘ𝐿𝑞\theta^{\prime}\in\Theta^{(L)}_{q}italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Θ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, such that

Φ~q(L)(;θ)Φ~q(L)(;θ)C(𝒦)subscriptnormsubscriptsuperscript~Φ𝐿𝑞𝜃subscriptsuperscript~Φ𝐿𝑞superscript𝜃𝐶𝒦\displaystyle\|\widetilde{\Phi}^{(L)}_{q}({\,\cdot\,};\theta)-\widetilde{\Phi}% ^{(L)}_{q}({\,\cdot\,};\theta^{\prime})\|_{C(\mathcal{K})}∥ over~ start_ARG roman_Φ end_ARG start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ ) - over~ start_ARG roman_Φ end_ARG start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT Lip(Rq(L))θθabsentLipsuperscriptsubscript𝑅𝑞𝐿subscriptnorm𝜃superscript𝜃superscript\displaystyle\leq\mathrm{Lip}(R_{q}^{(L)})\,\|\theta-\theta^{\prime}\|_{\ell^{% \infty}}≤ roman_Lip ( italic_R start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ) ∥ italic_θ - italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT
exp(Cqlog(Cq))log(q)γexp(Cqlog(Cq))\displaystyle\leq\exp(Cq\log(Cq))\frac{\log(q)^{-\gamma}}{\exp(Cq\log(Cq))}≤ roman_exp ( italic_C italic_q roman_log ( italic_C italic_q ) ) divide start_ARG roman_log ( italic_q ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT end_ARG start_ARG roman_exp ( italic_C italic_q roman_log ( italic_C italic_q ) ) end_ARG
=log(q)γ.\displaystyle=\log(q)^{-\gamma}.= roman_log ( italic_q ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT .

Thus,

(B.1) supθ[Mq,Mq]q^minθΘq(L)Φ~q(L)(;θ)Φ~q(L)(;θ)C(𝒦)log(q)γ.\displaystyle\sup_{\theta\in[-M_{q},M_{q}]^{\widehat{q}}}\min_{\theta^{\prime}% \in\Theta^{(L)}_{q}}\|\widetilde{\Phi}^{(L)}_{q}({\,\cdot\,};\theta)-% \widetilde{\Phi}^{(L)}_{q}({\,\cdot\,};\theta^{\prime})\|_{C(\mathcal{K})}\leq% \log(q)^{-\gamma}.roman_sup start_POSTSUBSCRIPT italic_θ ∈ [ - italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT over^ start_ARG italic_q end_ARG end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Θ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ over~ start_ARG roman_Φ end_ARG start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ ) - over~ start_ARG roman_Φ end_ARG start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≤ roman_log ( italic_q ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT .

Since |Θq(L)|CqmsubscriptsuperscriptΘ𝐿𝑞𝐶superscript𝑞𝑚|\Theta^{(L)}_{q}|\leq Cq^{m}| roman_Θ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | ≤ italic_C italic_q start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, any θΘq(L)superscript𝜃subscriptsuperscriptΘ𝐿𝑞\theta^{\prime}\in\Theta^{(L)}_{q}italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Θ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT can be identified with a unique bit-string in {0,1}qsuperscript01subscript𝑞\{0,1\}^{\ell_{q}}{ 0 , 1 } start_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, where q=Cqmsubscript𝑞𝐶superscript𝑞𝑚\ell_{q}=\lceil Cq^{m}\rceilroman_ℓ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT = ⌈ italic_C italic_q start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ⌉. Adding an additional number of O(log(q))𝑂𝑞O(\log(q))italic_O ( roman_log ( italic_q ) ) bits to encode the possible values of the depth parameter L{1,,q}𝐿1𝑞L\in\{1,\dots,q\}italic_L ∈ { 1 , … , italic_q }, we can now define a quantized neural operator Φ~nq:L2(D)×{0,1}nq:subscript~Φsubscript𝑛𝑞superscript𝐿2𝐷superscript01subscript𝑛𝑞\widetilde{\Phi}_{n_{q}}:L^{2}(D)\times\{0,1\}^{n_{q}}\to\mathbb{R}over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT : italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D ) × { 0 , 1 } start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → blackboard_R encoded by nqlog(q)+qCqmsimilar-tosubscript𝑛𝑞𝑞subscript𝑞similar-to𝐶superscript𝑞𝑚n_{q}\sim\log(q)+\ell_{q}\sim Cq^{m}italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∼ roman_log ( italic_q ) + roman_ℓ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∼ italic_C italic_q start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT bits, in the following way: Given [θ]{0,1}nqdelimited-[]𝜃superscript01subscript𝑛𝑞[\theta]\in\{0,1\}^{n_{q}}[ italic_θ ] ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, we first read off the length parameter L𝐿Litalic_L from the first log2qsubscript2𝑞\lceil\log_{2}q\rceil⌈ roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_q ⌉ bits. Removing these bits, the remaining qsubscript𝑞\ell_{q}roman_ℓ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT bits uniquely identify θΘq(L)superscript𝜃superscriptsubscriptΘ𝑞𝐿\theta^{\prime}\in\Theta_{q}^{(L)}italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Θ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT, and we set

(B.2) Φ~nq(;[θ]):=Φq(L)(;θ).assignsubscript~Φsubscript𝑛𝑞delimited-[]𝜃subscriptsuperscriptΦ𝐿𝑞superscript𝜃\displaystyle\widetilde{\Phi}_{n_{q}}({\,\cdot\,};[\theta]):=\Phi^{(L)}_{q}({% \,\cdot\,};\theta^{\prime}).over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ; [ italic_θ ] ) := roman_Φ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) .

Thus, Φ~nqsubscript~Φsubscript𝑛𝑞\widetilde{\Phi}_{n_{q}}over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT is a neural operator architecture with parameters encoded by nqqmasymptotically-equalssubscript𝑛𝑞superscript𝑞𝑚n_{q}\asymp q^{m}italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ≍ italic_q start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT bits. By our definition (B.2), any neural operator belonging to the set

{Φq(L)(;θ)|L{1,,q},θΘq(L)},conditional-setsubscriptsuperscriptΦ𝐿𝑞superscript𝜃formulae-sequence𝐿1𝑞superscript𝜃subscriptsuperscriptΘ𝐿𝑞{\left\{\Phi^{(L)}_{q}({\,\cdot\,};\theta^{\prime})\,\middle|\,L\in\{1,\dots,q% \},\;\theta^{\prime}\in\Theta^{(L)}_{q}\right\}},{ roman_Φ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | italic_L ∈ { 1 , … , italic_q } , italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Θ start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT } ,

can be represented exactly by suitable choice of [θ]{0,1}nqdelimited-[]𝜃superscript01subscript𝑛𝑞[\theta]\in\{0,1\}^{n_{q}}[ italic_θ ] ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. And thus, by (B.1), we have

(B.3) supL=1,,qsupθ[Mq,Mq]q^min[θ]{0,1}nqΦ~q(L)(;θ)Φ~nq(;[θ])C(𝒦)log(q)γ.\displaystyle\sup_{L=1,\dots,q}\sup_{\theta\in[-M_{q},M_{q}]^{\widehat{q}}}% \min_{[\theta]\in\{0,1\}^{n_{q}}}\|\widetilde{\Phi}^{(L)}_{q}({\,\cdot\,};% \theta)-\widetilde{\Phi}_{n_{q}}({\,\cdot\,};[\theta])\|_{C(\mathcal{K})}\leq% \log(q)^{-\gamma}.roman_sup start_POSTSUBSCRIPT italic_L = 1 , … , italic_q end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT italic_θ ∈ [ - italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT over^ start_ARG italic_q end_ARG end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT [ italic_θ ] ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ over~ start_ARG roman_Φ end_ARG start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ ) - over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ; [ italic_θ ] ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≤ roman_log ( italic_q ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT .

We finally note that any neural operator architecture ΦqsubscriptΦ𝑞\Phi_{q}roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT with at most q𝑞qitalic_q parameters is represented as Φq(;θ)=Φ^q(L)(;θ^)subscriptΦ𝑞𝜃superscriptsubscript^Φ𝑞𝐿^𝜃\Phi_{q}({\,\cdot\,};\theta)=\widehat{\Phi}_{q}^{(L)}({\,\cdot\,};\widehat{% \theta})roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ ) = over^ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT ( ⋅ ; over^ start_ARG italic_θ end_ARG ) for suitably chosen θ^=θ^(θ)^𝜃^𝜃𝜃\widehat{\theta}=\widehat{\theta}(\theta)over^ start_ARG italic_θ end_ARG = over^ start_ARG italic_θ end_ARG ( italic_θ ) (see Observation 1, above). In fact, this only involves zero-padding of the weights θ𝜃\thetaitalic_θ. In particular, if θ[Mq,Mq]q𝜃superscriptsubscript𝑀𝑞subscript𝑀𝑞𝑞\theta\in[-M_{q},M_{q}]^{q}italic_θ ∈ [ - italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT, then θ^[Mq,Mq]q^^𝜃superscriptsubscript𝑀𝑞subscript𝑀𝑞^𝑞\widehat{\theta}\in[-M_{q},M_{q}]^{\widehat{q}}over^ start_ARG italic_θ end_ARG ∈ [ - italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT over^ start_ARG italic_q end_ARG end_POSTSUPERSCRIPT.

From (B.3), it follows that

(B.4) supθ[Mq,Mq]qmin[θ]{0,1}nqΦq(;θ)Φ~nq(;[θ])C(𝒦)log(q)γ,\displaystyle\sup_{\theta\in[-M_{q},M_{q}]^{q}}\min_{[\theta]\in\{0,1\}^{n_{q}% }}\|\Phi_{q}({\,\cdot\,};\theta)-\widetilde{\Phi}_{n_{q}}({\,\cdot\,};[\theta]% )\|_{C(\mathcal{K})}\leq\log(q)^{-\gamma},roman_sup start_POSTSUBSCRIPT italic_θ ∈ [ - italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT [ italic_θ ] ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ roman_Φ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( ⋅ ; italic_θ ) - over~ start_ARG roman_Φ end_ARG start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ; [ italic_θ ] ) ∥ start_POSTSUBSCRIPT italic_C ( caligraphic_K ) end_POSTSUBSCRIPT ≤ roman_log ( italic_q ) start_POSTSUPERSCRIPT - italic_γ end_POSTSUPERSCRIPT ,

as claimed. This concludes the proof. ∎