License: arXiv.org perpetual non-exclusive license
arXiv:2403.02300v1 [cs.DS] 04 Mar 2024

Statistical Query Lower Bounds for Learning Truncated Gaussians Authors are in alphabetical order.

Ilias Diakonikolas
University of Wisconsin-Madison
[email protected]
Supported by NSF Medium Award CCF-2107079, NSF Award CCF-1652862 (CAREER), a Sloan Research Fellowship, and a DARPA Learning with Less Labels (LwLL) grant.
   Daniel M. Kane
University of California, San Diego
[email protected]
Supported by NSF Medium Award CCF-2107547, NSF Award CCF-1553288 (CAREER), and a Sloan Research Fellowship.
   Thanasis Pittas
University of Wisconsin-Madison
[email protected]
Supported by NSF Medium Award CCF-2107079.
   Nikos Zarifis
University of Wisconsin-Madison
[email protected]
Supported in part by DARPA Learning with Less Labels (LwLL) grant and NSF Award DMS-2023239 (TRIPODS).
Abstract

We study the problem of estimating the mean of an identity covariance Gaussian in the truncated setting, in the regime when the truncation set comes from a low-complexity family 𝒞𝒞\cal{C}caligraphic_C of sets. Specifically, for a fixed but unknown truncation set Sd𝑆superscript𝑑S\subseteq\mathbb{R}^{d}italic_S ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, we are given access to samples from the distribution 𝒩(𝝁,𝐈)𝒩𝝁𝐈\mathcal{N}(\bm{\mu},\mathbf{I})caligraphic_N ( bold_italic_μ , bold_I ) truncated to the set S𝑆Sitalic_S. The goal is to estimate 𝝁𝝁\bm{\mu}bold_italic_μ within accuracy ε>0𝜀0\varepsilon>0italic_ε > 0 in 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-norm. Our main result is a Statistical Query (SQ) lower bound suggesting a super-polynomial information-computation gap for this task. In more detail, we show that the complexity of any SQ algorithm for this problem is dpoly(1/ε)superscript𝑑poly1𝜀d^{\mathrm{poly}(1/\varepsilon)}italic_d start_POSTSUPERSCRIPT roman_poly ( 1 / italic_ε ) end_POSTSUPERSCRIPT, even when the class 𝒞𝒞\cal{C}caligraphic_C is simple so that poly(d/ε)poly𝑑𝜀\mathrm{poly}(d/\varepsilon)roman_poly ( italic_d / italic_ε ) samples information-theoretically suffice. Concretely, our SQ lower bound applies when 𝒞𝒞\cal{C}caligraphic_C is a union of a bounded number of rectangles whose VC dimension and Gaussian surface are small. As a corollary of our construction, it also follows that the complexity of the previously known algorithm for this task is qualitatively best possible.

1 Introduction

We study the classical problem of high-dimensional statistical estimation from truncated (or censored) samples, with a focus on establishing information-computation tradeoffs. Truncation refers to the situation where samples falling outside of a fixed (potentially unknown) set are not observed. This phenomenon naturally arises in a wide range of applications across the sciences. Estimation from truncated samples has a rich history in statistics, dating back to [Ber60], who studied it in the context of smallpox vaccination. Pioneering early works include those of [Gal98], in the context of analyzing speeds of trotting horses; Pearson and Lee [Pea02, PL08, Lee14], who used the method of moments for mean and standard deviation estimation from truncated Gaussian one-dimensional data; and [Fis31], who leveraged maximum likelihood estimation for the same problem. The reader is referred to [Sch86, Coh91, BE14] for some textbooks on the topic.

Despite extensive investigation in the statistics community, the first statistically and computationally efficient algorithms for learning multivariate structured distributions in the truncated setting were developed fairly recently in the computer science community. The first such work [DGTZ18] focuses on the fundamental setting of Gaussian mean and covariance estimation, and operates under the assumption that the truncation set is known (i.e., the learner is given oracle access to it). Most relevant to the current paper is the subsequent work of [KTZ19] that studies mean estimation of a spherical Gaussian under the assumption that the truncation set is unknown and is promised to lie in a family of sets with “low complexity”. Beyond mean and covariance estimation, a related line of work has addressed a range of other statistical tasks, including linear regression [DGTZ19, DRZ20, IZD20, DSYZ21, CDIZ23], non-parametric estimation [DKTZ21], and learning other structured distribution families [FKT20, Ple21, LWZ23].

In this paper, we focus on the basic task of estimating the mean of a spherical Gaussian in the truncated setting with unknown truncation set. The setup is as follows: Let Sd𝑆superscript𝑑S\subseteq\mathbb{R}^{d}italic_S ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be a fixed subset of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and denote by α>0𝛼0\alpha>0italic_α > 0 its probability mass under 𝒩(𝝁,𝐈)𝒩𝝁𝐈\mathcal{N}(\bm{\mu},\mathbf{I})caligraphic_N ( bold_italic_μ , bold_I ), the d𝑑ditalic_d-dimensional Gaussian with mean μ𝜇\muitalic_μ and identity covariance. Given access to samples from the distribution 𝒩(𝝁,𝐈)𝒩𝝁𝐈\mathcal{N}(\bm{\mu},\mathbf{I})caligraphic_N ( bold_italic_μ , bold_I ) truncated to the set S𝑆Sitalic_S, the goal is to estimate 𝝁𝝁\bm{\mu}bold_italic_μ within accuracy ε>0𝜀0\varepsilon>0italic_ε > 0 in 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm. For the special case of this task where the truncation set S𝑆Sitalic_S is known to the algorithm (more accurately, the algorithm has oracle access to S𝑆Sitalic_S), [DGTZ18] gave a polynomial-time algorithm that draws O~α(d/ε2)subscript~𝑂𝛼𝑑superscript𝜀2\tilde{O}_{\alpha}(d/\varepsilon^{2})over~ start_ARG italic_O end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_d / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) truncated samples222The notation O~αsubscript~𝑂𝛼\tilde{O}_{\alpha}over~ start_ARG italic_O end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT suppresses poly-logarithmic dependence on its argument and some dependence on the parameter α𝛼\alphaitalic_α. In the context of the lower bounds established here, α𝛼\alphaitalic_α will be a positive universal constant, specifically α>1/2𝛼12\alpha>1/2italic_α > 1 / 2. . They also pointed out that if S𝑆Sitalic_S is unknown, and arbitrarily complex, then the learning problem is not solvable to better than constant accuracy, with any finite number of samples.

Although the latter statement might seem discouraging, a natural avenue to circumvent this bottleneck is restricting the set S𝑆Sitalic_S to a family of “low complexity”. For example, early work in the statistics community considered the case where S𝑆Sitalic_S is a rectangle (box) or a union of a small number of rectangles. Intuitively, for such “simple” classes of sets, positive results may be attainable, even for unknown truncation set. [KTZ19] formalized this intuition by providing the first positive results — both information-theoretic and algorithmic — for settings where the unknown set S𝑆Sitalic_S has “low complexity”. Specifically, [KTZ19] showed two (incomparable) positive results, corresponding to natural complexity measures of the family of sets containing S𝑆Sitalic_S:

  1. 1.

    If S𝑆Sitalic_S comes from a family of sets 𝒞𝒞\cal{C}caligraphic_C with VC-dimension V𝑉Vitalic_V, then the problem is information-theoretically solvable to 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT error ε𝜀\varepsilonitalic_ε with O~(V/ε+d2/ε2)~𝑂𝑉𝜀superscript𝑑2superscript𝜀2\tilde{O}(V/\varepsilon+d^{2}/\varepsilon^{2})over~ start_ARG italic_O end_ARG ( italic_V / italic_ε + italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) truncated samples.

  2. 2.

    If S𝑆Sitalic_S comes from a family of of sets 𝒞𝒞\cal{C}caligraphic_C with Gaussian Surface Area at most Γ>0Γ0\Gamma>0roman_Γ > 0 (Definition 2.2), then the problem is solvable using sample and computational complexity dΓ2poly(1/ε)superscript𝑑superscriptΓ2poly1𝜀d^{\Gamma^{2}\mathrm{poly}(1/\varepsilon)}italic_d start_POSTSUPERSCRIPT roman_Γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_poly ( 1 / italic_ε ) end_POSTSUPERSCRIPT.

For the setting of bounded VC-dimension, [KTZ19] stated that “Obtaining a computationally efficient algorithm seems unlikely, unless one restricts attention to simple specific set families […]”. For the setting of bounded surface area, the algorithm of [KTZ19] has sample and computational complexity dpoly(1/ε)superscript𝑑poly1𝜀d^{\mathrm{poly}(1/\varepsilon)}italic_d start_POSTSUPERSCRIPT roman_poly ( 1 / italic_ε ) end_POSTSUPERSCRIPT, even for Γ=O(1)Γ𝑂1\Gamma=O(1)roman_Γ = italic_O ( 1 ), which is not required for simple classes of sets. This discussion serves as a natural motivation for the following question:

Are there “simple” families of sets for which learning truncated Gaussians

exhibits an information-computation tradeoff?

In more detail, is there a class of sets 𝒞𝒞\cal{C}caligraphic_C such that our learning task is information-theoretically solvable with a few samples, and at the same time any computationally efficient algorithm requires significantly more samples?

We tackle this question in two well-studied restricted models of computation — namely, in the Statistical Query (SQ) model [Kea98] and the low-degree polynomial testing model [Hop18, KWB22]. As our main result, we answer the above question in the affirmative for both of these models. Specifically, we exhibit a family of sets with small VC dimension and small Gaussian surface area (hence, for which our problem is solvable with polynomial sample complexity), such that any SQ algorithm (and low-degree polynomial test) necessarily requires super-polynomial complexity. As a corollary of our construction, it also follows that the complexity of the algorithm in [KTZ19] (which is efficiently implementable in these models) is qualitatively best possible. Finally, we remark that the underlying family of sets used in our hard instance is quite simple — consisting of unions of a bounded number of rectangles.

1.1 Our Results

To formally state our main result, we summarize the basics of the SQ model.

SQ Model Basics The model, introduced by [Kea98] and extensively studied since, see, e.g., [FGR+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT13], considers algorithms that, instead of drawing individual samples from the target distribution, have indirect access to the distribution using the following oracle:

Definition 1.1 (STAT Oracle).

Let D𝐷Ditalic_D be a distribution on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. A statistical query is a bounded function f:d[1,1]normal-:𝑓normal-→superscript𝑑11f:\mathbb{R}^{d}\to[-1,1]italic_f : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → [ - 1 , 1 ]. For τ>0𝜏0\tau>0italic_τ > 0, the STAT(τ)normal-STAT𝜏\mathrm{STAT}(\tau)roman_STAT ( italic_τ ) oracle responds to the query f𝑓fitalic_f with a value v𝑣vitalic_v such that |v𝐄XD[f(X)]|τ𝑣subscript𝐄similar-to𝑋𝐷𝑓𝑋𝜏|v-\operatorname*{\mathbf{E}}_{X\sim D}[f(X)]|\leq\tau| italic_v - bold_E start_POSTSUBSCRIPT italic_X ∼ italic_D end_POSTSUBSCRIPT [ italic_f ( italic_X ) ] | ≤ italic_τ. We call τ𝜏\tauitalic_τ the tolerance of the statistical query.

An SQ lower bound for a learning problem is an unconditional statement that any SQ algorithm for the problem either needs to perform a large number q𝑞qitalic_q of queries, or at least one query with very small tolerance τ𝜏\tauitalic_τ. Note that, by Hoeffding-Chernoff bounds, a query of tolerance τ𝜏\tauitalic_τ is implementable by non-SQ algorithms by drawing O(1/τ2)𝑂1superscript𝜏2O(1/\tau^{2})italic_O ( 1 / italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) samples and averaging them. Thus, an SQ lower bound intuitively serves as a tradeoff between runtime of Ω(q)Ω𝑞\Omega(q)roman_Ω ( italic_q ) and sample complexity of Ω(1/τ)Ω1𝜏\Omega(1/\tau)roman_Ω ( 1 / italic_τ ).

Main Result We are now ready to state our main result:

Theorem 1.2 (SQ Lower Bound for Learning Truncated Gaussians).

Let d,k+𝑑𝑘subscriptd,k\in\mathbb{Z}_{+}italic_d , italic_k ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, ε>dc𝜀superscript𝑑𝑐\varepsilon>d^{-c}italic_ε > italic_d start_POSTSUPERSCRIPT - italic_c end_POSTSUPERSCRIPT for some sufficiently small constant c>0𝑐0c>0italic_c > 0, and assume kc/ε0.15𝑘𝑐superscript𝜀0.15k\leq c/\varepsilon^{0.15}italic_k ≤ italic_c / italic_ε start_POSTSUPERSCRIPT 0.15 end_POSTSUPERSCRIPT. Let 𝒞𝒞\mathcal{C}caligraphic_C be the class of all sets Sd𝑆superscript𝑑S\subseteq\mathbb{R}^{d}italic_S ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with the properties that: (i) S𝑆Sitalic_S is the complement of a union of at most k2superscript𝑘2k^{2}italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT rectangles, and (ii) S𝑆Sitalic_S has O(1)𝑂1O(1)italic_O ( 1 ) Gaussian surface area and Ω(1)normal-Ω1\Omega(1)roman_Ω ( 1 ) mass under the target Gaussian. Suppose that 𝒜𝒜\mathcal{A}caligraphic_A is an algorithm with the guarantee that, given SQ access to 𝒩(𝛍,𝐈)𝒩𝛍𝐈\mathcal{N}(\bm{\mu},\mathbf{I})caligraphic_N ( bold_italic_μ , bold_I ) truncated on a set S𝒞𝑆𝒞S\in\mathcal{C}italic_S ∈ caligraphic_C (where 𝛍𝛍\bm{\mu}bold_italic_μ and S𝑆Sitalic_S are unknown to the algorithm), it outputs a 𝛍^normal-^𝛍\hat{\bm{\mu}}over^ start_ARG bold_italic_μ end_ARG with 𝛍^𝛍2εsubscriptnormnormal-^𝛍𝛍2𝜀\|\hat{\bm{\mu}}-\bm{\mu}\|_{2}\leq\varepsilon∥ over^ start_ARG bold_italic_μ end_ARG - bold_italic_μ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_ε. Then, 𝒜𝒜\mathcal{A}caligraphic_A either performs 2dΩ(1)superscript2superscript𝑑normal-Ω12^{d^{\Omega(1)}}2 start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT roman_Ω ( 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT many queries, or makes at least one query with tolerance dΩ(k)superscript𝑑normal-Ω𝑘d^{-\Omega(k)}italic_d start_POSTSUPERSCRIPT - roman_Ω ( italic_k ) end_POSTSUPERSCRIPT.

We conclude with some remarks about our main theorem. First, our SQ lower bound holds against a simple family of sets. The class 𝒞𝒞\mathcal{C}caligraphic_C, as (the complement of) a union of k2poly(1/ε)superscript𝑘2poly1𝜀k^{2}\leq\mathrm{poly}(1/\varepsilon)italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ roman_poly ( 1 / italic_ε ) rectangles, has VC dimension poly(dlog(k))poly𝑑𝑘\mathrm{poly}(d\log(k))roman_poly ( italic_d roman_log ( italic_k ) ). By the sample upper bound of [KTZ19], the corresponding learning problem is thus solvable with poly(d/ε)poly𝑑𝜀\mathrm{poly}(d/\varepsilon)roman_poly ( italic_d / italic_ε ) samples. Even for this simple class, our result suggests that any efficient SQ algorithm requires dpoly(1/ε)superscript𝑑poly1𝜀d^{\mathrm{poly}(1/\varepsilon)}italic_d start_POSTSUPERSCRIPT roman_poly ( 1 / italic_ε ) end_POSTSUPERSCRIPT samples. Second, the fact that our family of sets has bounded Gaussian surface area implies that the algorithm of [KTZ19] (which fits in the SQ model) is qualitatively optimal. Finally, using a known equivalence between SQ and low-degree polynomial tests [BBH+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT21], a qualitatively similar lower bound holds for the latter model. This implication is formally stated in Appendix C.

1.2 Overview of Techniques

Our SQ lower bound leverages the methodology of [DKS17] and in particular the low-dimensional extension from [DKPZ21] (see also [DKRS23]), which provides a generic SQ hardness result for the problem of non-Gaussian component analysis: Fix a low-dimensional distribution A𝐴Aitalic_A on msuperscript𝑚\mathbb{R}^{m}blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT with mdmuch-less-than𝑚𝑑m\ll ditalic_m ≪ italic_d, and consider the family 𝒟𝒟\mathcal{D}caligraphic_D of all d𝑑ditalic_d-dimensional distributions defined to coincide with A𝐴Aitalic_A in some (hidden) m𝑚mitalic_m-dimensional subspace and being equal with the standard Gaussian in its orthogonal complement. The main result of that framework (cf. Fact 2.4) is that, if A𝐴Aitalic_A is itself similar to 𝒩(𝟎,𝐈m×m)𝒩0subscript𝐈𝑚𝑚\mathcal{N}(\mathbf{0},\mathbf{I}_{m\times m})caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_m × italic_m end_POSTSUBSCRIPT ) — in the sense that it matches its first k𝑘kitalic_k moments with 𝒩(𝟎,𝐈m×m)𝒩0subscript𝐈𝑚𝑚\mathcal{N}(\mathbf{0},\mathbf{I}_{m\times m})caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_m × italic_m end_POSTSUBSCRIPT ) — then the hypothesis testing problem of distinguishing between a member of 𝒟𝒟\mathcal{D}caligraphic_D and 𝒩(𝟎,𝐈d×d)𝒩0subscript𝐈𝑑𝑑\mathcal{N}(\mathbf{0},\mathbf{I}_{d\times d})caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d × italic_d end_POSTSUBSCRIPT ) requires either 2dΩ(1)superscript2superscript𝑑Ω12^{d^{\Omega(1)}}2 start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT roman_Ω ( 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT statistical queries, or a query of small tolerance τ<dΩ(k)𝜏superscript𝑑Ω𝑘\tau<d^{-\Omega(k)}italic_τ < italic_d start_POSTSUPERSCRIPT - roman_Ω ( italic_k ) end_POSTSUPERSCRIPT. This generic hardness result has been the basis of SQ lower bounds for a range of tasks, including learning mixture models [DKS17, DKPZ23, DKS23], robust mean/covariance estimation [DKS17], robust linear regression [DKS19], learning halfspaces and other natural concepts with label noise [DKZ20, GGK20, DK22, DKPZ21, DKK+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT22], list-decodable estimation [DKS18, DKP+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT21], learning simple neural networks [DKKZ20], and generative models [CLL22].

Given this generic SQ lower bound, we want to formulate our learning problem as a valid instance of non-Gaussian component analysis. That is, we aim to find a (low- dimensional) distribution A𝐴Aitalic_A that matches k=poly(1/ε)𝑘poly1𝜀k=\mathrm{poly}(1/\varepsilon)italic_k = roman_poly ( 1 / italic_ε ) moments with 𝒩(𝟎,𝐈m×m)𝒩0subscript𝐈𝑚𝑚\mathcal{N}(\mathbf{0},\mathbf{I}_{m\times m})caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_m × italic_m end_POSTSUBSCRIPT ) and is itself a truncated Gaussian with mean 𝝁𝝁\bm{\mu}bold_italic_μ where 𝝁2εsubscriptnorm𝝁2𝜀\|\bm{\mu}\|_{2}\geq\varepsilon∥ bold_italic_μ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ italic_ε, truncated on a set S𝑆Sitalic_S of large mass with small Gaussian surface area and VC dimension. This would imply that learning the mean of truncated Gaussians within error ε𝜀\varepsilonitalic_ε in d𝑑ditalic_d dimensions has SQ complexity dpoly(1/ε)superscript𝑑poly1𝜀d^{\mathrm{poly}(1/\varepsilon)}italic_d start_POSTSUPERSCRIPT roman_poly ( 1 / italic_ε ) end_POSTSUPERSCRIPT.

A first attempt is to try to find a one-dimensional distribution A𝐴Aitalic_A for the above construction, in particular an A𝐴Aitalic_A of the form 𝒩(ε,1)𝒩𝜀1\mathcal{N}(\varepsilon,1)caligraphic_N ( italic_ε , 1 ) conditioned on a set S𝑆Sitalic_S which is a union of a small number of intervals. We start by noting that it suffices to make this construction work for any finite number of intervals — indeed, an existing technique from [DKZ20] can be leveraged to show that if k𝑘kitalic_k moments can be matched using a finite union of intervals, they can also be matched using just k𝑘kitalic_k intervals (Proposition 4.3). Without having to worry about the number of intervals, our proof strategy would be as follows: The first step is to create a continuous version of the construction. Namely, we wish to find a function f:[0,1]:𝑓01f:\mathbb{R}\to[0,1]italic_f : blackboard_R → [ 0 , 1 ] so that if the probability density function of 𝒩(ε,1)𝒩𝜀1\mathcal{N}(\varepsilon,1)caligraphic_N ( italic_ε , 1 ) is multiplied by f𝑓fitalic_f and re-normalized, we obtain a probability distribution that matches k𝑘kitalic_k moments with 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 ) (here f𝑓fitalic_f represents some fractional version of the indicator function of S𝑆Sitalic_S). This can be done somewhat explicitly. In particular, we can take f𝑓fitalic_f to be a carefully chosen exponential function, so that the density of 𝒩(μ,1)𝒩𝜇1\mathcal{N}(\mu,1)caligraphic_N ( italic_μ , 1 ) times f𝑓fitalic_f re-normalized is exactly 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 ) (cf. Claim 5.1). Unfortunately, this f𝑓fitalic_f will not be bounded in [0,1]01[0,1][ 0 , 1 ], and in particular in the extreme tails will have f(x)>1𝑓𝑥1f(x)>1italic_f ( italic_x ) > 1. However, since so little mass lies at these tails, if we truncate f𝑓fitalic_f to have value at most 1111, we do not change the first k𝑘kitalic_k moments by much (cf. Claim 5.2). Then, using a technique of [DKS17] (also see Chapter 8 in [DK23]), we can modify f𝑓fitalic_f slightly (by adding a carefully chosen polynomial times the indicator function of an interval) to fix this moment discrepancy.

The above sketch gives a one-dimensional construction, where S𝑆Sitalic_S is a union of at most k𝑘kitalic_k intervals. Unfortunately, this class of sets will have surface area approximately k𝑘kitalic_k, which is far too large for our purposes. In fact, for any reasonable one-dimensional set S𝑆Sitalic_S, we will expect to have at least constant surface area (as a single point on the boundary of S𝑆Sitalic_S contributes this much). To overcome this obstacle, we will need to consider a two-dimensional construction instead (eventually given in Proposition 3.2). That is, we want to exhibit a family of sets S2𝑆superscript2S\subseteq\mathbb{R}^{2}italic_S ⊆ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT so that if the two-dimensional Gaussian 𝒩((ε,0),𝐈2×2)𝒩𝜀0subscript𝐈22\mathcal{N}((\varepsilon,0),\mathbf{I}_{2\times 2})caligraphic_N ( ( italic_ε , 0 ) , bold_I start_POSTSUBSCRIPT 2 × 2 end_POSTSUBSCRIPT ) is conditioned on S𝑆Sitalic_S, we match k𝑘kitalic_k low-degree moments with 𝒩(𝟎,𝐈2×2)𝒩0subscript𝐈22\mathcal{N}(\mathbf{0},\mathbf{I}_{2\times 2})caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT 2 × 2 end_POSTSUBSCRIPT ). Specifically, we will take S𝑆Sitalic_S to be the complement of an appropriate union of rectangles in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (cf. Figure 1). We first describe the goal of our construction for each axis separately: For the y𝑦yitalic_y-axis, we need to find a small union of intervals U𝑈Uitalic_U such that (i) the mass of U𝑈Uitalic_U is δ1=poly(ε)subscript𝛿1poly𝜀\delta_{1}=\mathrm{poly}(\varepsilon)italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_poly ( italic_ε ), and (ii) 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 ) conditioned outside of U𝑈Uitalic_U matches k𝑘kitalic_k moments with 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 ). For the x𝑥xitalic_x-axis, we need another union of intervals T𝑇Titalic_T, which also has small mass δ2=poly(ε)subscript𝛿2poly𝜀\delta_{2}=\mathrm{poly}(\varepsilon)italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = roman_poly ( italic_ε ), and such that the pdf of 𝒩(ε,1)𝒩𝜀1\mathcal{N}(\varepsilon,1)caligraphic_N ( italic_ε , 1 ) multiplied by (1δ𝟙(xT))1𝛿1𝑥𝑇(1-\delta\mathds{1}(x\in T))( 1 - italic_δ blackboard_1 ( italic_x ∈ italic_T ) ) matches its first k𝑘kitalic_k moments with 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 ). The multiplication by (1δ𝟙(xT))1𝛿1𝑥𝑇(1-\delta\mathds{1}(x\in T))( 1 - italic_δ blackboard_1 ( italic_x ∈ italic_T ) ) is needed to take into account the δ𝛿\deltaitalic_δ-mass removed in the y𝑦yitalic_y-axis earlier. After having these at hand, we can let S𝑆Sitalic_S be the complement of T×U𝑇𝑈T\times Uitalic_T × italic_U. By a direct computation one can show that, given the properties above, 𝒩((ε,0),𝐈2×2)𝒩𝜀0subscript𝐈22\mathcal{N}((\varepsilon,0),\mathbf{I}_{2\times 2})caligraphic_N ( ( italic_ε , 0 ) , bold_I start_POSTSUBSCRIPT 2 × 2 end_POSTSUBSCRIPT ) conditioned on S𝑆Sitalic_S matches its low-degree moments with 𝒩(𝟎,𝐈2×2)𝒩0subscript𝐈22\mathcal{N}(\mathbf{0},\mathbf{I}_{2\times 2})caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT 2 × 2 end_POSTSUBSCRIPT ) (cf. calculation in (4)). We note that the boundary of S𝑆Sitalic_S consists of k2superscript𝑘2k^{2}italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT rectangles, each with perimeter approximately δ1+δ2=poly(ε)subscript𝛿1subscript𝛿2poly𝜀\delta_{1}+\delta_{2}=\mathrm{poly}(\varepsilon)italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = roman_poly ( italic_ε ). So, if k1/δ1+δ2much-less-than𝑘1subscript𝛿1subscript𝛿2k\ll 1/\sqrt{\delta_{1}+\delta_{2}}italic_k ≪ 1 / square-root start_ARG italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG, S𝑆Sitalic_S will have small Gaussian surface area. Finally, the plan for showing existence of the sets U𝑈Uitalic_U and T𝑇Titalic_T is the following: To establish the existence of the U𝑈Uitalic_U set (cf. Lemma 3.3), we first provide an explicit construction of intervals, by splitting the real line into tiny intervals and defining U𝑈Uitalic_U to include 1δ1𝛿1-\delta1 - italic_δ fraction of each; and then leverage the technique from [DKZ20] to reduce the number of intervals down to k𝑘kitalic_k. The proof strategy for the set T𝑇Titalic_T (Lemma 3.4) is essentially the one that was discussed in the previous paragraph.

Refer to caption
Figure 1: Truncation set in 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The red dotted parts of the horizontal and vertical axes represent the unions of intervals U𝑈Uitalic_U and T𝑇Titalic_T, respectively. The white rectangles represent the set T×U𝑇𝑈T\times Uitalic_T × italic_U, and the remaining gray area of 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is their complement, denoted as (T×U)csuperscript𝑇𝑈𝑐(T\times U)^{c}( italic_T × italic_U ) start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT, on which we truncate the Gaussian distribution 𝒩((ε,0),𝐈2×2)𝒩𝜀0subscript𝐈22\mathcal{N}((\varepsilon,0),\mathbf{I}_{2\times 2})caligraphic_N ( ( italic_ε , 0 ) , bold_I start_POSTSUBSCRIPT 2 × 2 end_POSTSUBSCRIPT ).

2 Preliminaries

Basic Notation We use the notation [n]=def{1,,n}superscriptdefdelimited-[]𝑛1𝑛[n]\stackrel{{\scriptstyle{\mathrm{\footnotesize def}}}}{{=}}\{1,\ldots,n\}[ italic_n ] start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP { 1 , … , italic_n }. We use \mathbb{Z}blackboard_Z for positive integers, and 2\|\cdot\|_{2}∥ ⋅ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-norm of vectors. We use 𝒩(𝝁,𝚺)𝒩𝝁𝚺\mathcal{N}(\bm{\mu},\mathbf{\Sigma})caligraphic_N ( bold_italic_μ , bold_Σ ) to denote the Gaussian with mean 𝝁𝝁\bm{\mu}bold_italic_μ and covariance matrix 𝚺𝚺\mathbf{\Sigma}bold_Σ and use ϕ𝝁,𝚺(𝐱)subscriptitalic-ϕ𝝁𝚺𝐱\phi_{\bm{\mu},\mathbf{\Sigma}}(\mathbf{x})italic_ϕ start_POSTSUBSCRIPT bold_italic_μ , bold_Σ end_POSTSUBSCRIPT ( bold_x ) for its probability density function. For other distributions, we will slightly abuse notation by using the same letter for a distribution and its pdf, e.g., we will denote by P(𝐱)𝑃𝐱P(\mathbf{x})italic_P ( bold_x ) the pdf of a distribution P𝑃Pitalic_P.

Definition 2.1 (Truncated Gaussian).

For a set Sd𝑆superscript𝑑S\subseteq\mathbb{R}^{d}italic_S ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, a vector 𝛍d𝛍superscript𝑑\bm{\mu}\in\mathbb{R}^{d}bold_italic_μ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and a PSD matrix 𝚺d×d𝚺superscript𝑑𝑑\mathbf{\Sigma}\in\mathbb{R}^{d\times d}bold_Σ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT, we define 𝒩(𝛍,𝚺,S)𝒩𝛍𝚺𝑆\mathcal{N}(\bm{\mu},\mathbf{\Sigma},S)caligraphic_N ( bold_italic_μ , bold_Σ , italic_S ) to be the Gaussian with mean 𝒩(𝛍,𝚺)𝒩𝛍𝚺\mathcal{N}(\bm{\mu},\mathbf{\Sigma})caligraphic_N ( bold_italic_μ , bold_Σ ) after truncation using the set S𝑆Sitalic_S, i.e., the distribution with the following pdf (where ϕ𝛍,𝚺subscriptitalic-ϕ𝛍𝚺\phi_{\bm{\mu},\mathbf{\Sigma}}italic_ϕ start_POSTSUBSCRIPT bold_italic_μ , bold_Σ end_POSTSUBSCRIPT denotes the pdf of 𝒩(𝛍,𝚺)𝒩𝛍𝚺\mathcal{N}(\bm{\mu},\mathbf{\Sigma})caligraphic_N ( bold_italic_μ , bold_Σ )): ϕ𝛍,𝚺,S(x):=Z1𝟙(xS)ϕ𝛍,𝚺(𝐱)assignsubscriptitalic-ϕ𝛍𝚺𝑆𝑥superscript𝑍11𝑥𝑆subscriptitalic-ϕ𝛍𝚺𝐱\phi_{\bm{\mu},\mathbf{\Sigma},S}(x):=Z^{-1}\mathds{1}(x\in S)\phi_{\bm{\mu},% \mathbf{\Sigma}}(\mathbf{x})italic_ϕ start_POSTSUBSCRIPT bold_italic_μ , bold_Σ , italic_S end_POSTSUBSCRIPT ( italic_x ) := italic_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT blackboard_1 ( italic_x ∈ italic_S ) italic_ϕ start_POSTSUBSCRIPT bold_italic_μ , bold_Σ end_POSTSUBSCRIPT ( bold_x ), where Z:=d𝟙(xS)ϕ𝛍,𝚺(𝐱)d𝐱assign𝑍subscriptsuperscript𝑑1𝑥𝑆subscriptitalic-ϕ𝛍𝚺𝐱differential-d𝐱Z:=\int_{\mathbb{R}^{d}}\mathds{1}(x\in S)\phi_{\bm{\mu},\mathbf{\Sigma}}(% \mathbf{x})\mathrm{d}\mathbf{x}italic_Z := ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT blackboard_1 ( italic_x ∈ italic_S ) italic_ϕ start_POSTSUBSCRIPT bold_italic_μ , bold_Σ end_POSTSUBSCRIPT ( bold_x ) roman_d bold_x.

We now define Gaussian Surface Area (GSA), which has served as a complexity measure of sets in learning theory and related fields; see, e.g., [KOS08, Kan11, Nee14].

Definition 2.2 (Gaussian Surface Area).

For a Borel set Ad𝐴superscript𝑑A\subseteq\mathbb{R}^{d}italic_A ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, its Gaussian surface area is defined by Γ(A)=deflim infδ0𝒩(AδA)δ,superscriptnormal-defnormal-Γ𝐴subscriptlimit-infimumnormal-→𝛿0𝒩subscript𝐴𝛿𝐴𝛿\Gamma(A)\stackrel{{\scriptstyle{\mathrm{\footnotesize def}}}}{{=}}\liminf_{% \delta\to 0}\frac{\mathcal{N}(A_{\delta}\setminus A)}{\delta},roman_Γ ( italic_A ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP lim inf start_POSTSUBSCRIPT italic_δ → 0 end_POSTSUBSCRIPT divide start_ARG caligraphic_N ( italic_A start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ∖ italic_A ) end_ARG start_ARG italic_δ end_ARG , where Aδ={x:dist(x,A)δ}subscript𝐴𝛿conditional-set𝑥normal-dist𝑥𝐴𝛿A_{\delta}=\{x:\mathrm{dist}(x,A)\leq\delta\}italic_A start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT = { italic_x : roman_dist ( italic_x , italic_A ) ≤ italic_δ }.

Additional Background on the SQ Model The main fact that we use from the SQ literature [FGR+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT13, DKS17] concerns the family of distributions which are standard Gaussian along every direction, except from a low-dimensional subspace, where they are forced to be equal to some other (non-Gaussian) distribution A𝐴Aitalic_A.

Definition 2.3 (Hidden Direction Distribution).

For an m𝑚mitalic_m-dimensional distribution A𝐴Aitalic_A and a matrix 𝐕m×d𝐕superscript𝑚𝑑\mathbf{V}\in\mathbb{R}^{m\times d}bold_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d end_POSTSUPERSCRIPT, we define the distribution PA,𝐕subscript𝑃𝐴𝐕P_{A,\mathbf{V}}italic_P start_POSTSUBSCRIPT italic_A , bold_V end_POSTSUBSCRIPT with pdf A(𝐕𝐱)(2π)(dm)2e12𝐱𝐕𝐕𝐱22𝐴𝐕𝐱superscript2𝜋𝑑𝑚2superscript𝑒12superscriptsubscriptnorm𝐱superscript𝐕top𝐕𝐱22A(\mathbf{V}\mathbf{x})(2\pi)^{-\frac{(d-m)}{2}}e^{-\frac{1}{2}\|\mathbf{x}-% \mathbf{V}^{\top}\mathbf{V}\mathbf{x}\|_{2}^{2}}italic_A ( bold_Vx ) ( 2 italic_π ) start_POSTSUPERSCRIPT - divide start_ARG ( italic_d - italic_m ) end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_x - bold_V start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Vx ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT.

The main result from [DKS17] is that, if A𝐴Aitalic_A is similar to Gaussian, in the sense that its first moments agree with those of 𝒩(𝟎,𝐈)𝒩0𝐈\mathcal{N}(\mathbf{0},\mathbf{I})caligraphic_N ( bold_0 , bold_I ), then the hypothesis testing problem between 𝒩(𝟎,𝐈)𝒩0𝐈\mathcal{N}(\mathbf{0},\mathbf{I})caligraphic_N ( bold_0 , bold_I ) and a distribution of the above family is hard for any SQ algorithm. The following fact shows formally this hardness. See Appendix A for related preliminaries and the proof of the fact below; χ2(A,B)superscript𝜒2𝐴𝐵\chi^{2}(A,B)italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , italic_B ) below is defined as dA2(𝐱)/B(𝐱)d𝐱1subscriptsuperscript𝑑superscript𝐴2𝐱𝐵𝐱differential-d𝐱1\int_{\mathbb{R}^{d}}A^{2}(\mathbf{x})/B(\mathbf{x})\,\mathrm{d}\mathbf{x}-1∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( bold_x ) / italic_B ( bold_x ) roman_d bold_x - 1.

Fact 2.4.

Let d,k𝑑𝑘d,k\in\mathbb{Z}italic_d , italic_k ∈ blackboard_Z and m<d1/10𝑚superscript𝑑110m<d^{1/10}italic_m < italic_d start_POSTSUPERSCRIPT 1 / 10 end_POSTSUPERSCRIPT and k<dc𝑘superscript𝑑𝑐k<d^{c}italic_k < italic_d start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT for some sufficiently small constant c>0𝑐0c>0italic_c > 0. Let A𝐴Aitalic_A be a distribution over msuperscript𝑚\mathbb{R}^{m}blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT such that its first k𝑘kitalic_k moments match the corresponding moments of 𝒩(𝟎,𝐈m)𝒩0subscript𝐈𝑚\mathcal{N}(\mathbf{0},\mathbf{I}_{m})caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ). Define the family 𝒟𝒟\mathcal{D}caligraphic_D of distributions containing PA,𝐔subscript𝑃𝐴𝐔P_{A,\mathbf{U}}italic_P start_POSTSUBSCRIPT italic_A , bold_U end_POSTSUBSCRIPT (cf. Definition 2.3) for all matrices 𝐔m×d𝐔superscript𝑚𝑑\mathbf{U}\in\mathbb{R}^{m\times d}bold_U ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d end_POSTSUPERSCRIPT such that 𝐔𝐔=𝐈msuperscript𝐔𝐔topsubscript𝐈𝑚\mathbf{U}\mathbf{U}^{\top}=\mathbf{I}_{m}bold_UU start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Then, any SQ algorithm that distinguishes between 𝒩(𝟎,𝐈d)𝒩0subscript𝐈𝑑\mathcal{N}(\mathbf{0},\mathbf{I}_{d})caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) and 𝒟𝒟\mathcal{D}caligraphic_D requires either 2dΩ(1)superscript2superscript𝑑normal-Ω12^{d^{\Omega(1)}}2 start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT roman_Ω ( 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT many queries, or at least one query with tolerance dΩ(k)χ2(A,𝒩(𝟎,𝐈m))superscript𝑑normal-Ω𝑘superscript𝜒2𝐴𝒩0subscript𝐈𝑚d^{-\Omega(k)}\sqrt{\chi^{2}(A,\mathcal{N}(\mathbf{0},\mathbf{I}_{m}))}italic_d start_POSTSUPERSCRIPT - roman_Ω ( italic_k ) end_POSTSUPERSCRIPT square-root start_ARG italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) end_ARG.

3 SQ Lower Bound For Truncated Gaussians

In this section we formalize the proof strategy for Theorem 1.2 which had been informally described in Section 1.2. We will show a stronger version of that theorem, stated below, which concerns hypothesis testing between the standard Gaussian and a truncated Gaussian.

Theorem 3.1 (SQ Lower Bound; Hypothesis Testing Hardness).

Let d,k+𝑑𝑘subscriptd,k\in\mathbb{Z}_{+}italic_d , italic_k ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, ε>dc𝜀superscript𝑑𝑐\varepsilon>d^{-c}italic_ε > italic_d start_POSTSUPERSCRIPT - italic_c end_POSTSUPERSCRIPT for some sufficiently small constant c>0𝑐0c>0italic_c > 0, and kc/ε0.15𝑘𝑐superscript𝜀0.15k\leq c/\varepsilon^{0.15}italic_k ≤ italic_c / italic_ε start_POSTSUPERSCRIPT 0.15 end_POSTSUPERSCRIPT. Let 𝒞𝒞\mathcal{C}caligraphic_C be the class of all sets Sd𝑆superscript𝑑S\subseteq\mathbb{R}^{d}italic_S ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with the properties that: (i) S𝑆Sitalic_S is a union of at most k2superscript𝑘2k^{2}italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-many d𝑑ditalic_d-dimensional rectangles, and (ii) S𝑆Sitalic_S has O(1)𝑂1O(1)italic_O ( 1 ) Gaussian surface area and Ω(1)normal-Ω1\Omega(1)roman_Ω ( 1 ) mass under the target Gaussian. Consider the hypothesis testing problem defined below:

  1. 1.

    Null Hypothesis: D=𝒩(𝟎,𝐈)𝐷𝒩0𝐈D=\mathcal{N}(\mathbf{0},\mathbf{I})italic_D = caligraphic_N ( bold_0 , bold_I ).

  2. 2.

    Alternative Hypothesis: P𝒟𝑃𝒟P\in\mathcal{D}italic_P ∈ caligraphic_D, where 𝒟𝒟\mathcal{D}caligraphic_D is the family of truncated Gaussians 𝒩(𝝁,𝐈,S)𝒩𝝁𝐈𝑆\mathcal{N}(\bm{\mu},\mathbf{I},S)caligraphic_N ( bold_italic_μ , bold_I , italic_S ) for all 𝝁2εsubscriptnorm𝝁2𝜀\|\bm{\mu}\|_{2}\geq\varepsilon∥ bold_italic_μ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ italic_ε and S𝒞𝑆𝒞S\in\mathcal{C}italic_S ∈ caligraphic_C.

Then, any SQ algorithm that solves the above problem, either performs 2dΩ(1)superscript2superscript𝑑normal-Ω12^{d^{\Omega(1)}}2 start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT roman_Ω ( 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT many queries or performs at least one query with tolerance dΩ(k)superscript𝑑normal-Ω𝑘d^{-\Omega(k)}italic_d start_POSTSUPERSCRIPT - roman_Ω ( italic_k ) end_POSTSUPERSCRIPT.

Note that Theorem 3.1 implies immediately Theorem 1.2 by a straightforward reduction: One can first find 𝝁𝝁\bm{\mu}bold_italic_μ approximating the true mean up to error ε/2𝜀2\varepsilon/2italic_ε / 2 and then reject the null hypothesis if 𝝁>ε/2norm𝝁𝜀2\|\bm{\mu}\|>\varepsilon/2∥ bold_italic_μ ∥ > italic_ε / 2.

The end goal towards showing Theorem 3.1 is to establish the existence of the following two-dimensional truncated Gaussian distribution A𝐴Aitalic_A that matches k=poly(1/ε)𝑘poly1𝜀k=\mathrm{poly}(1/\varepsilon)italic_k = roman_poly ( 1 / italic_ε ) moments with the standard Gaussian (Proposition 3.2).

Proposition 3.2.

Let c>0𝑐0c>0italic_c > 0 be a sufficiently small absolute constant, ε(0,c)𝜀0𝑐\varepsilon\in(0,c)italic_ε ∈ ( 0 , italic_c ), and k=c/ε0.15𝑘𝑐superscript𝜀0.15k=c/\varepsilon^{0.15}italic_k = italic_c / italic_ε start_POSTSUPERSCRIPT 0.15 end_POSTSUPERSCRIPT. There exists a distribution A𝐴Aitalic_A on 2superscript2\mathbb{R}^{2}blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, for which the following are true:

  1. 1.

    A𝐴Aitalic_A matches its first k𝑘kitalic_k moments with the 2-dimensional standard Gaussian.

  2. 2.

    χ2(A,𝒩(𝟎,𝐈2))=O(1)superscript𝜒2𝐴𝒩0subscript𝐈2𝑂1\chi^{2}(A,\mathcal{N}(\mathbf{0},\mathbf{I}_{2}))=O(1)italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) = italic_O ( 1 ).

  3. 3.

    Every distribution of the form PA,𝐕subscript𝑃𝐴𝐕P_{A,\mathbf{V}}italic_P start_POSTSUBSCRIPT italic_A , bold_V end_POSTSUBSCRIPT (cf. Definition 2.3) can be written as a truncated Gaussian 𝒩(𝝁,𝚺,S)𝒩𝝁𝚺𝑆\mathcal{N}(\bm{\mu},\mathbf{\Sigma},S)caligraphic_N ( bold_italic_μ , bold_Σ , italic_S ) for 𝝁=(ε,0)𝝁𝜀0\bm{\mu}=(\varepsilon,0)bold_italic_μ = ( italic_ε , 0 ), 𝚺=𝐈2𝚺subscript𝐈2\mathbf{\Sigma}=\mathbf{I}_{2}bold_Σ = bold_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (the 2×2222\times 22 × 2 identity matrix) and some Sd𝑆superscript𝑑S\subseteq\mathbb{R}^{d}italic_S ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT which has mass (with respect to the target Gaussian) at least 1/2121/21 / 2 and Gaussian surface area at most 1111.

Having Proposition 3.2 at hand, then Theorem 1.2 follows from Fact 2.4.

The proof of Proposition 3.2 requires two key results (Lemmata 3.3 and 3.4). In Proposition 3.2, A𝐴Aitalic_A is a 2-dimensional distribution that matches moments with the standard normal. In the following lemmata, we construct independently each dimension of that distribution. The marginal on the y𝑦yitalic_y-axis will be a standard normal, conditioned on a union of k𝑘kitalic_k intervals, as shown in Lemma 3.3 below. As mentioned in the proof sketch of Section 1.2, we want these intervals to have small mass, thus we will eventually use δ=ε𝛿𝜀\delta=\sqrt{\varepsilon}italic_δ = square-root start_ARG italic_ε end_ARG below. We defer the proof to Section 4.

Lemma 3.3.

For any δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ) and k+𝑘subscriptk\in\mathbb{Z}_{+}italic_k ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT there exists a set U𝑈U\subseteq\mathbb{R}italic_U ⊆ blackboard_R such that: U𝑈Uitalic_U is a union of k𝑘kitalic_k intervals with Pry𝒩(0,1)[yU]=δsubscriptnormal-Prsimilar-to𝑦𝒩01𝑦𝑈𝛿\Pr_{y\sim\mathcal{N}(0,1)}[y\in U]=\deltaroman_Pr start_POSTSUBSCRIPT italic_y ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_y ∈ italic_U ] = italic_δ and for all t=1,,k𝑡1normal-…𝑘t=1,\ldots,kitalic_t = 1 , … , italic_k it holds

𝐄y𝒩(0,1)[yt|yU]=𝐄y𝒩(0,1)[yt].subscript𝐄similar-to𝑦𝒩01conditionalsuperscript𝑦𝑡𝑦𝑈subscript𝐄similar-to𝑦𝒩01superscript𝑦𝑡\displaystyle\operatorname*{\mathbf{E}}_{y\sim\mathcal{N}(0,1)}[y^{t}\;|\;y% \not\in U]=\operatorname*{\mathbf{E}}_{y\sim\mathcal{N}(0,1)}[y^{t}]\;.bold_E start_POSTSUBSCRIPT italic_y ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_y ∉ italic_U ] = bold_E start_POSTSUBSCRIPT italic_y ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] . (1)

Next we construct the marginal of A𝐴Aitalic_A for the x𝑥xitalic_x-axis. In this case, we start with a Gaussian distribution with mean ε𝜀\varepsilonitalic_ε, and we reweight it with a k𝑘kitalic_k-piecewise constant function taking values in {1δ,1}1𝛿1\{1-\delta,1\}{ 1 - italic_δ , 1 } so that it matches k𝑘kitalic_k moments with the standard normal. The reason why we use values in {1δ,1}1𝛿1\{1-\delta,1\}{ 1 - italic_δ , 1 } is because we have removed δ𝛿\deltaitalic_δ mass in our construction for the y𝑦yitalic_y-axis. This will be clearer when we provide the calculation that A𝐴Aitalic_A matches moments with the 2-dimensional Gaussian. The proof can be found on Section 5.

Lemma 3.4.

Let c>0𝑐0c>0italic_c > 0 be a sufficiently small absolute constant and ε(0,c)𝜀0𝑐\varepsilon\in(0,c)italic_ε ∈ ( 0 , italic_c ). Let δ,k𝛿𝑘\delta,kitalic_δ , italic_k be parameters so that δ=ε𝛿𝜀\delta=\sqrt{\varepsilon}italic_δ = square-root start_ARG italic_ε end_ARG and k=c/ε0.15𝑘𝑐superscript𝜀0.15k=c/\varepsilon^{0.15}italic_k = italic_c / italic_ε start_POSTSUPERSCRIPT 0.15 end_POSTSUPERSCRIPT. There exists a set T𝑇T\subseteq\mathbb{R}italic_T ⊆ blackboard_R such that: T𝑇Titalic_T is a union of k𝑘kitalic_k intervals, Prx𝒩(ε,1)[xT]ε0.3subscriptnormal-Prsimilar-to𝑥𝒩𝜀1𝑥𝑇superscript𝜀0.3\Pr_{x\sim\mathcal{N}(\varepsilon,1)}[x\in T]\leq\varepsilon^{0.3}roman_Pr start_POSTSUBSCRIPT italic_x ∼ caligraphic_N ( italic_ε , 1 ) end_POSTSUBSCRIPT [ italic_x ∈ italic_T ] ≤ italic_ε start_POSTSUPERSCRIPT 0.3 end_POSTSUPERSCRIPT, and for all t=0,,k𝑡0normal-…𝑘t=0,\ldots,kitalic_t = 0 , … , italic_k it holds

𝐄x𝒩(ε,1)[xt(1δ𝟙{xT})]Z1=𝐄x𝒩(0,1)[xt],subscript𝐄similar-to𝑥𝒩𝜀1superscript𝑥𝑡1𝛿1𝑥𝑇superscript𝑍1subscript𝐄similar-to𝑥𝒩01superscript𝑥𝑡\displaystyle\operatorname*{\mathbf{E}}_{x\sim\mathcal{N}(\varepsilon,1)}[x^{t% }(1-\delta\mathds{1}\{x\in T\})]Z^{-1}=\operatorname*{\mathbf{E}}_{x\sim% \mathcal{N}(0,1)}[x^{t}]\;,bold_E start_POSTSUBSCRIPT italic_x ∼ caligraphic_N ( italic_ε , 1 ) end_POSTSUBSCRIPT [ italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( 1 - italic_δ blackboard_1 { italic_x ∈ italic_T } ) ] italic_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = bold_E start_POSTSUBSCRIPT italic_x ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] , (2)

where Z=𝐄x𝒩(ε,1)[1δ𝟙{xT}]𝑍subscript𝐄similar-to𝑥𝒩𝜀11𝛿1𝑥𝑇Z=\operatorname*{\mathbf{E}}_{x\sim\mathcal{N}(\varepsilon,1)}[1-\delta\mathds% {1}\{x\in T\}]italic_Z = bold_E start_POSTSUBSCRIPT italic_x ∼ caligraphic_N ( italic_ε , 1 ) end_POSTSUBSCRIPT [ 1 - italic_δ blackboard_1 { italic_x ∈ italic_T } ].

Using Lemmata 3.3 and 3.4, we can prove Proposition 3.2 by letting

A(x,y)=ϕ(xε)ϕ(y)𝟙((x,y)T×U)Z.𝐴𝑥𝑦italic-ϕ𝑥𝜀italic-ϕ𝑦1𝑥𝑦𝑇𝑈𝑍\displaystyle A(x,y)=\frac{\phi(x-\varepsilon)\phi(y)\mathds{1}((x,y)\not\in T% \times U)}{Z}\;.italic_A ( italic_x , italic_y ) = divide start_ARG italic_ϕ ( italic_x - italic_ε ) italic_ϕ ( italic_y ) blackboard_1 ( ( italic_x , italic_y ) ∉ italic_T × italic_U ) end_ARG start_ARG italic_Z end_ARG .

In particular, it can be seen that A𝐴Aitalic_A matches k𝑘kitalic_k moments with the 2222-dimensional normal by a direct computation that uses (1), (2). We provide the proof below.

Proof of Proposition 3.2.

Let T,U,δ,Z𝑇𝑈𝛿𝑍T,U,\delta,Zitalic_T , italic_U , italic_δ , italic_Z as in Lemmata 3.3 and 3.4, we let A𝐴Aitalic_A to be a distribution defined by the following probability density function:

A(x,y)=ϕ(xε)ϕ(y)𝟙((x,y)T×U)Z.𝐴𝑥𝑦italic-ϕ𝑥𝜀italic-ϕ𝑦1𝑥𝑦𝑇𝑈𝑍\displaystyle A(x,y)=\frac{\phi(x-\varepsilon)\phi(y)\mathds{1}((x,y)\not\in T% \times U)}{Z}\;.italic_A ( italic_x , italic_y ) = divide start_ARG italic_ϕ ( italic_x - italic_ε ) italic_ϕ ( italic_y ) blackboard_1 ( ( italic_x , italic_y ) ∉ italic_T × italic_U ) end_ARG start_ARG italic_Z end_ARG . (3)

We start with Item 1. First we note that A𝐴Aitalic_A is indeed a valid distribution, i.e., the normalizing factor is correct.

Z𝑍\displaystyle Zitalic_Z =++D(x,y)dydx=+ϕ(xε)(𝟙(xT)yUϕ(y)dy+𝟙(xT)+ϕ(y)dy)dxabsentsuperscriptsubscriptsuperscriptsubscript𝐷𝑥𝑦differential-d𝑦differential-d𝑥superscriptsubscriptitalic-ϕ𝑥𝜀1𝑥𝑇subscript𝑦𝑈italic-ϕ𝑦differential-d𝑦1𝑥𝑇superscriptsubscriptitalic-ϕ𝑦differential-d𝑦differential-d𝑥\displaystyle=\int\limits_{-\infty}^{+\infty}\int\limits_{-\infty}^{+\infty}D(% x,y)\mathrm{d}y\,\mathrm{d}x=\int\limits_{-\infty}^{+\infty}\phi(x-\varepsilon% )\left(\mathds{1}(x\in T)\int_{y\not\in U}\phi(y)\mathrm{d}y+\mathds{1}(x\not% \in T)\int\limits_{-\infty}^{+\infty}\phi(y)\mathrm{d}y\right)\mathrm{d}x= ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_D ( italic_x , italic_y ) roman_d italic_y roman_d italic_x = ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( blackboard_1 ( italic_x ∈ italic_T ) ∫ start_POSTSUBSCRIPT italic_y ∉ italic_U end_POSTSUBSCRIPT italic_ϕ ( italic_y ) roman_d italic_y + blackboard_1 ( italic_x ∉ italic_T ) ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_y ) roman_d italic_y ) roman_d italic_x
=+ϕ(xε)(𝟙(xT)(1δ)+𝟙(xT))dxabsentsuperscriptsubscriptitalic-ϕ𝑥𝜀1𝑥𝑇1𝛿1𝑥𝑇differential-d𝑥\displaystyle=\int\limits_{-\infty}^{+\infty}\phi(x-\varepsilon)\left(\mathds{% 1}(x\in T)(1-\delta)+\mathds{1}(x\not\in T)\right)\mathrm{d}x= ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( blackboard_1 ( italic_x ∈ italic_T ) ( 1 - italic_δ ) + blackboard_1 ( italic_x ∉ italic_T ) ) roman_d italic_x (by Lemma 3.3)
=+ϕ(xε)(1δ𝟙(xT))dx,absentsuperscriptsubscriptitalic-ϕ𝑥𝜀1𝛿1𝑥𝑇differential-d𝑥\displaystyle=\int\limits_{-\infty}^{+\infty}\phi(x-\varepsilon)(1-\delta% \mathds{1}(x\in T))\mathrm{d}x\;,= ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ blackboard_1 ( italic_x ∈ italic_T ) ) roman_d italic_x ,

where the calculation essentially used the geometry of our sets (see also Figure 1): For any fixed x𝑥xitalic_x, there are two cases. If xT𝑥𝑇x\not\in Titalic_x ∉ italic_T, then no Gaussian mass is removed from the y𝑦yitalic_y-integral; otherwise, a (1δ)1𝛿(1-\delta)( 1 - italic_δ ) mass is removed (as explained in Lemma 3.3).

We can similarly see that A𝐴Aitalic_A matches the first k𝑘kitalic_k moments with the standard two dimensional Gaussian: Let t𝑡titalic_t and s𝑠sitalic_s be non-negative integers with t+sk𝑡𝑠𝑘t+s\leq kitalic_t + italic_s ≤ italic_k. Then,

1Z1𝑍\displaystyle\frac{1}{Z}divide start_ARG 1 end_ARG start_ARG italic_Z end_ARG ++xtysϕ(xε)ϕ(y)𝟙((x,y)T×U)dydxsuperscriptsubscriptsuperscriptsubscriptsuperscript𝑥𝑡superscript𝑦𝑠italic-ϕ𝑥𝜀italic-ϕ𝑦1𝑥𝑦𝑇𝑈differential-d𝑦differential-d𝑥\displaystyle\int\limits_{-\infty}^{+\infty}\int\limits_{-\infty}^{+\infty}x^{% t}y^{s}\phi(x-\varepsilon)\phi(y)\mathds{1}((x,y)\not\in T\times U)\mathrm{d}y% \,\mathrm{d}x∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) italic_ϕ ( italic_y ) blackboard_1 ( ( italic_x , italic_y ) ∉ italic_T × italic_U ) roman_d italic_y roman_d italic_x (4)
=1Z+xtϕ(xε)(𝟙(xT)yUysϕ(y)dy+𝟙(xT)+ysϕ(y)dy)dxabsent1𝑍superscriptsubscriptsuperscript𝑥𝑡italic-ϕ𝑥𝜀1𝑥𝑇subscript𝑦𝑈superscript𝑦𝑠italic-ϕ𝑦differential-d𝑦1𝑥𝑇superscriptsubscriptsuperscript𝑦𝑠italic-ϕ𝑦differential-d𝑦differential-d𝑥\displaystyle=\frac{1}{Z}\int\limits_{-\infty}^{+\infty}x^{t}\phi(x-% \varepsilon)\left(\mathds{1}(x\in T)\int\limits_{y\not\in U}y^{s}\phi(y)% \mathrm{d}y+\mathds{1}(x\not\in T)\int\limits_{-\infty}^{+\infty}y^{s}\phi(y)% \mathrm{d}y\right)\mathrm{d}x= divide start_ARG 1 end_ARG start_ARG italic_Z end_ARG ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( blackboard_1 ( italic_x ∈ italic_T ) ∫ start_POSTSUBSCRIPT italic_y ∉ italic_U end_POSTSUBSCRIPT italic_y start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_y ) roman_d italic_y + blackboard_1 ( italic_x ∉ italic_T ) ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_y ) roman_d italic_y ) roman_d italic_x (5)
=1Z+xtϕ(xε)+ysϕ(y)(𝟙(xT)(1δ)+𝟙(xT))dydxabsent1𝑍superscriptsubscriptsuperscript𝑥𝑡italic-ϕ𝑥𝜀superscriptsubscriptsuperscript𝑦𝑠italic-ϕ𝑦1𝑥𝑇1𝛿1𝑥𝑇differential-d𝑦differential-d𝑥\displaystyle=\frac{1}{Z}\int\limits_{-\infty}^{+\infty}x^{t}\phi(x-% \varepsilon)\int\limits_{-\infty}^{+\infty}y^{s}\phi(y)\left(\mathds{1}(x\in T% )(1-\delta)+\mathds{1}(x\not\in T)\right)\mathrm{d}y\mathrm{d}x= divide start_ARG 1 end_ARG start_ARG italic_Z end_ARG ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_y ) ( blackboard_1 ( italic_x ∈ italic_T ) ( 1 - italic_δ ) + blackboard_1 ( italic_x ∉ italic_T ) ) roman_d italic_y roman_d italic_x (by Lemma 3.3)
=+xtϕ(xε)(1δ𝟙(xT))Zdx+ysϕ(y)dyabsentsuperscriptsubscriptsuperscript𝑥𝑡italic-ϕ𝑥𝜀1𝛿1𝑥𝑇𝑍differential-d𝑥superscriptsubscriptsuperscript𝑦𝑠italic-ϕ𝑦differential-d𝑦\displaystyle=\int\limits_{-\infty}^{+\infty}x^{t}\frac{\phi(x-\varepsilon)(1-% \delta\mathds{1}(x\in T))}{Z}\mathrm{d}x\int\limits_{-\infty}^{+\infty}y^{s}% \phi(y)\mathrm{d}y= ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT divide start_ARG italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ blackboard_1 ( italic_x ∈ italic_T ) ) end_ARG start_ARG italic_Z end_ARG roman_d italic_x ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_y ) roman_d italic_y (6)
=+xtϕ(x)dx+ysϕ(y)dy.absentsuperscriptsubscriptsuperscript𝑥𝑡italic-ϕ𝑥differential-d𝑥superscriptsubscriptsuperscript𝑦𝑠italic-ϕ𝑦differential-d𝑦\displaystyle=\int\limits_{-\infty}^{+\infty}x^{t}\phi(x)\mathrm{d}x\int% \limits_{-\infty}^{+\infty}y^{s}\phi(y)\mathrm{d}y\;.= ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x ) roman_d italic_x ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_y ) roman_d italic_y . (by Lemma 3.4)

Finally, it is easy to see that the chi-square of A𝐴Aitalic_A is O(1)𝑂1O(1)italic_O ( 1 ).

χ2(A,𝒩(𝟎,𝐈2))superscript𝜒2𝐴𝒩0subscript𝐈2\displaystyle\chi^{2}(A,\mathcal{N}(\mathbf{0},\mathbf{I}_{2}))italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) =++A2(x,y)ϕ(x)ϕ(y)dydx1absentsuperscriptsubscriptsuperscriptsubscriptsuperscript𝐴2𝑥𝑦italic-ϕ𝑥italic-ϕ𝑦differential-d𝑦differential-d𝑥1\displaystyle=\int\limits_{-\infty}^{+\infty}\int\limits_{-\infty}^{+\infty}% \frac{A^{2}(x,y)}{\phi(x)\phi(y)}\mathrm{d}y\mathrm{d}x-1= ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT divide start_ARG italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x , italic_y ) end_ARG start_ARG italic_ϕ ( italic_x ) italic_ϕ ( italic_y ) end_ARG roman_d italic_y roman_d italic_x - 1
=+ϕ2(xε)ϕ(x)+ϕ(y)𝟙((x,y)T×U)dydx1absentsuperscriptsubscriptsuperscriptitalic-ϕ2𝑥𝜀italic-ϕ𝑥superscriptsubscriptitalic-ϕ𝑦1𝑥𝑦𝑇𝑈differential-d𝑦differential-d𝑥1\displaystyle=\int\limits_{-\infty}^{+\infty}\frac{\phi^{2}(x-\varepsilon)}{% \phi(x)}\int\limits_{-\infty}^{+\infty}\phi(y)\mathds{1}((x,y)\not\in T\times U% )\mathrm{d}y\mathrm{d}x-1= ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT divide start_ARG italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x - italic_ε ) end_ARG start_ARG italic_ϕ ( italic_x ) end_ARG ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_y ) blackboard_1 ( ( italic_x , italic_y ) ∉ italic_T × italic_U ) roman_d italic_y roman_d italic_x - 1
+ϕ2(xε)ϕ(x)dx=χ2(𝒩(ε,1),𝒩(0,1))=eε2.absentsuperscriptsubscriptsuperscriptitalic-ϕ2𝑥𝜀italic-ϕ𝑥differential-d𝑥superscript𝜒2𝒩𝜀1𝒩01superscript𝑒superscript𝜀2\displaystyle\leq\int\limits_{-\infty}^{+\infty}\frac{\phi^{2}(x-\varepsilon)}% {\phi(x)}\mathrm{d}x=\chi^{2}(\mathcal{N}(\varepsilon,1),\mathcal{N}(0,1))=e^{% \varepsilon^{2}}\;.≤ ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT divide start_ARG italic_ϕ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_x - italic_ε ) end_ARG start_ARG italic_ϕ ( italic_x ) end_ARG roman_d italic_x = italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( caligraphic_N ( italic_ε , 1 ) , caligraphic_N ( 0 , 1 ) ) = italic_e start_POSTSUPERSCRIPT italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT .

We move to Item 3. The fact that PA,𝐕subscript𝑃𝐴𝐕P_{A,\mathbf{V}}italic_P start_POSTSUBSCRIPT italic_A , bold_V end_POSTSUBSCRIPT is a truncated Gaussian follows trivially by our definitions. The Gaussian surface area bound comes from the fact that T×U𝑇𝑈T\times Uitalic_T × italic_U is a union of at most k2superscript𝑘2k^{2}italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT many rectangles, each with perimeter O(δ)𝑂𝛿O(\delta)italic_O ( italic_δ ) (this is because the sets T𝑇Titalic_T and U𝑈Uitalic_U from Lemmata 3.3 and 3.4 have mass at most O(δ)𝑂𝛿O(\delta)italic_O ( italic_δ )). Using δ=ε𝛿𝜀\delta=\sqrt{\varepsilon}italic_δ = square-root start_ARG italic_ε end_ARG and k1/ε1/4much-less-than𝑘1superscript𝜀14k\ll 1/\varepsilon^{1/4}italic_k ≪ 1 / italic_ε start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT, we obtain that the Gaussian surface area is at most 1111. ∎

4 Proof of Lemma 3.3

Regarding Lemma 3.3, we need to find a union U𝑈Uitalic_U of k𝑘kitalic_k intervals such that the truncated version of 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 ) on these intervals matches moments with 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 ), and the mass of U𝑈Uitalic_U under 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 ) is equal to a parameter δ𝛿\deltaitalic_δ of our choice. The proof strategy is the following: First, we note in Claim 4.1 that it suffices to find a piecewise constant function f:{δ,1δ}:𝑓𝛿1𝛿f:\mathbb{R}\to\{-\delta,1-\delta\}italic_f : blackboard_R → { - italic_δ , 1 - italic_δ } such that 𝐄z𝒩(0,1)[ztf(z)]=0subscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡𝑓𝑧0\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}f(z)]=0bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_f ( italic_z ) ] = 0, i.e., the weighted by f𝑓fitalic_f moments of 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 ) are zero. Claim 4.1 implies that once such a function f𝑓fitalic_f is found, Lemma 3.3 follows by letting the set U𝑈Uitalic_U be the union of all the intervals where f(z)>0𝑓𝑧0f(z)>0italic_f ( italic_z ) > 0. We proceed to showing the existence of f𝑓fitalic_f through a two-step process. We start with an explicit construction in Claim 4.2. Although capable of making the weighted moments arbitrarly close to zero, this construction yields a function with a significantly larger number of pieces than k𝑘kitalic_k. We are then able to reduce the number of pieces down to the desired count of k𝑘kitalic_k using a technique from [DKZ20], implemented in Proposition 4.3.

Claim 4.1.

Let U𝑈U\subseteq\mathbb{R}italic_U ⊆ blackboard_R and t+𝑡subscriptt\in\mathbb{Z}_{+}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT be a set and an integer. Define the piecewise constant function

f(z)={1δ,zUδ,zU,𝑓𝑧cases1𝛿𝑧𝑈𝛿𝑧𝑈\displaystyle f(z)=\begin{cases}1-\delta,&z\in U\\ -\delta,&z\not\in U\;,\end{cases}italic_f ( italic_z ) = { start_ROW start_CELL 1 - italic_δ , end_CELL start_CELL italic_z ∈ italic_U end_CELL end_ROW start_ROW start_CELL - italic_δ , end_CELL start_CELL italic_z ∉ italic_U , end_CELL end_ROW

with δ:=Prz𝒩(0,1)[zU]assign𝛿subscriptnormal-Prsimilar-to𝑧𝒩01𝑧𝑈\delta:=\Pr_{z\sim\mathcal{N}(0,1)}[z\in U]italic_δ := roman_Pr start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z ∈ italic_U ]. The following three statements are equivalent:

  1. 1.

    𝐄z𝒩(0,1)[ztf(z)]=0subscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡𝑓𝑧0\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}f(z)]=0bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_f ( italic_z ) ] = 0.

  2. 2.

    𝐄z𝒩(0,1)[zt|zU]=𝐄z𝒩(0,1)[zt]subscript𝐄similar-to𝑧𝒩01conditionalsuperscript𝑧𝑡𝑧𝑈subscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}|z\in U]=\operatorname% *{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}]bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_z ∈ italic_U ] = bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ].

  3. 3.

    𝐄z𝒩(0,1)[zt|zU]=𝐄z𝒩(0,1)[zt]subscript𝐄similar-to𝑧𝒩01conditionalsuperscript𝑧𝑡𝑧𝑈subscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}|z\not\in U]=% \operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}]bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_z ∉ italic_U ] = bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ].

Proof.

We first show the equivalence between Item 2 and Item 3. We assume Item 2 and show Item 3 (the other direction is identical):

𝐄z𝒩(0,1)[zt]subscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡\displaystyle\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}]bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] =𝐄z𝒩(0,1)[zt𝟙(zU)]+𝐄z𝒩(0,1)[zt𝟙(zU)]absentsubscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡1𝑧𝑈subscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡1𝑧𝑈\displaystyle=\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}\mathds{% 1}(z\in U)]+\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}\mathds{1}% (z\not\in U)]= bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT blackboard_1 ( italic_z ∈ italic_U ) ] + bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT blackboard_1 ( italic_z ∉ italic_U ) ]
=δ𝐄z𝒩(0,1)[zt|zU]+(1δ)𝐄z𝒩(0,1)[zt|zU]absent𝛿subscript𝐄similar-to𝑧𝒩01conditionalsuperscript𝑧𝑡𝑧𝑈1𝛿subscript𝐄similar-to𝑧𝒩01conditionalsuperscript𝑧𝑡𝑧𝑈\displaystyle=\delta\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}|z% \in U]+(1-\delta)\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}|z% \not\in U]= italic_δ bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_z ∈ italic_U ] + ( 1 - italic_δ ) bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_z ∉ italic_U ] (7)
=δ𝐄z𝒩(0,1)[zt]+(1δ)𝐄z𝒩(0,1)[zt|zU],absent𝛿subscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡1𝛿subscript𝐄similar-to𝑧𝒩01conditionalsuperscript𝑧𝑡𝑧𝑈\displaystyle=\delta\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}]+% (1-\delta)\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}|z\not\in U]\;,= italic_δ bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + ( 1 - italic_δ ) bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_z ∉ italic_U ] ,

where the last line used Item 2. Rearranging, this means that 𝐄z𝒩(0,1)[zt|zU]=𝐄z𝒩(0,1)[zt]subscript𝐄similar-to𝑧𝒩01conditionalsuperscript𝑧𝑡𝑧𝑈subscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}|z\not\in U]=% \operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}]bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_z ∉ italic_U ] = bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ].

We now prove that Item 2 and Item 3 imply Item 1, i.e., 𝐄z𝒩(0,1)[ztf(z)]=0subscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡𝑓𝑧0\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}f(z)]=0bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_f ( italic_z ) ] = 0.

1δ(1δ)𝐄z𝒩(0,1)[ztf(z)]1𝛿1𝛿subscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡𝑓𝑧\displaystyle\frac{1}{\delta(1-\delta)}\operatorname*{\mathbf{E}}_{z\sim% \mathcal{N}(0,1)}[z^{t}f(z)]divide start_ARG 1 end_ARG start_ARG italic_δ ( 1 - italic_δ ) end_ARG bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_f ( italic_z ) ] =1δ𝐄z𝒩(0,1)[zt𝟙(zU)]11δ𝐄z𝒩(0,1)[zt𝟙(zU)]absent1𝛿subscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡1𝑧𝑈11𝛿subscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡1𝑧𝑈\displaystyle=\frac{1}{\delta}\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1% )}[z^{t}\mathds{1}(z\in U)]-\frac{1}{1-\delta}\operatorname*{\mathbf{E}}_{z% \sim\mathcal{N}(0,1)}[z^{t}\mathds{1}(z\not\in U)]= divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT blackboard_1 ( italic_z ∈ italic_U ) ] - divide start_ARG 1 end_ARG start_ARG 1 - italic_δ end_ARG bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT blackboard_1 ( italic_z ∉ italic_U ) ]
=𝐄z𝒩(0,1)[zt|zU]𝐄z𝒩(0,1)[zt|zU]=0,absentsubscript𝐄similar-to𝑧𝒩01conditionalsuperscript𝑧𝑡𝑧𝑈subscript𝐄similar-to𝑧𝒩01conditionalsuperscript𝑧𝑡𝑧𝑈0\displaystyle=\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}|z\in U]% -\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}|z\not\in U]=0\;,= bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_z ∈ italic_U ] - bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_z ∉ italic_U ] = 0 , (8)

where the last equality is due to the part we showed earlier.

The direction from Item 1 to Item 2 is similar: By writing out 𝐄z𝒩(0,1)[ztf(z)]=0subscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡𝑓𝑧0\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}f(z)]=0bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_f ( italic_z ) ] = 0 similarly to (8) we can see that 𝐄z𝒩(0,1)[zt|zU]=𝐄z𝒩(0,1)[zt|zU]subscript𝐄similar-to𝑧𝒩01conditionalsuperscript𝑧𝑡𝑧𝑈subscript𝐄similar-to𝑧𝒩01conditionalsuperscript𝑧𝑡𝑧𝑈\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}|z\in U]=\operatorname% *{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}|z\not\in U]bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_z ∈ italic_U ] = bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_z ∉ italic_U ]. Then, using this in (7) we get that both conditional expectations are equal to 𝐄z𝒩(0,1)[zt]subscript𝐄similar-to𝑧𝒩01superscript𝑧𝑡\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[z^{t}]bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]. ∎

Next we explicitly construct a piecewise constant function from \mathbb{R}blackboard_R to {δ,1δ}𝛿1𝛿\{-\delta,1-\delta\}{ - italic_δ , 1 - italic_δ } that achieves zero weighted moments (or, more accurately, arbitrarily small weighted moments).

Claim 4.2.

For any η>0𝜂0\eta>0italic_η > 0 and δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ), there exists a (klog(1/η))2k/ηsuperscript𝑘1𝜂2𝑘𝜂(k\log(1/\eta))^{2k}/\eta( italic_k roman_log ( 1 / italic_η ) ) start_POSTSUPERSCRIPT 2 italic_k end_POSTSUPERSCRIPT / italic_η-piecewise constant function gη:{1δ,δ}normal-:subscript𝑔𝜂normal-→1𝛿𝛿g_{\eta}:\mathbb{R}\to\{1-\delta,-\delta\}italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT : blackboard_R → { 1 - italic_δ , - italic_δ } such that |𝐄z𝒩(0,1)[gη(z)zt]|ηsubscript𝐄similar-to𝑧𝒩01subscript𝑔𝜂𝑧superscript𝑧𝑡𝜂|\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[g_{\eta}(z)z^{t}]|\leq\eta| bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] | ≤ italic_η, for all t=0,1,,k𝑡01normal-…𝑘t=0,1,\ldots,kitalic_t = 0 , 1 , … , italic_k and Prz𝒩(0,1)[f(z)=1δ][δη,δ+η]subscriptnormal-Prsimilar-to𝑧𝒩01𝑓𝑧1𝛿𝛿𝜂𝛿𝜂\Pr_{z\sim\mathcal{N}(0,1)}[f(z)=1-\delta]\in[\delta-\eta,\delta+\eta]roman_Pr start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_f ( italic_z ) = 1 - italic_δ ] ∈ [ italic_δ - italic_η , italic_δ + italic_η ].

We sketch the proof of Claim 4.2, with the full version being deferred to Section 4.1. The idea is that we partition the real line into intervals Ai=[is,(i+1)s]subscript𝐴𝑖𝑖𝑠𝑖1𝑠A_{i}=[is,(i+1)s]italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ italic_i italic_s , ( italic_i + 1 ) italic_s ] for i𝑖i\in\mathbb{Z}italic_i ∈ blackboard_Z using a small step size s𝑠sitalic_s. For each i𝑖i\in\mathbb{Z}italic_i ∈ blackboard_Z, we further split Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT into two parts Ai+:=[is,(i+δ)s]assignsuperscriptsubscript𝐴𝑖𝑖𝑠𝑖𝛿𝑠A_{i}^{+}:=[is,(i+\delta)s]italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT := [ italic_i italic_s , ( italic_i + italic_δ ) italic_s ] and Ai:=[(i+δ)s,(i+1)s]assignsuperscriptsubscript𝐴𝑖𝑖𝛿𝑠𝑖1𝑠A_{i}^{-}:=[(i+\delta)s,(i+1)s]italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT := [ ( italic_i + italic_δ ) italic_s , ( italic_i + 1 ) italic_s ], i.e., the ratio of the sub-intervals’ length is δ/(1δ)𝛿1𝛿\delta/(1-\delta)italic_δ / ( 1 - italic_δ ). We define gη(z)=1δsubscript𝑔𝜂𝑧1𝛿g_{\eta}(z)=1-\deltaitalic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) = 1 - italic_δ on Ai+superscriptsubscript𝐴𝑖A_{i}^{+}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and gη(z)=δsubscript𝑔𝜂𝑧𝛿g_{\eta}(z)=-\deltaitalic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) = - italic_δ on Aisuperscriptsubscript𝐴𝑖A_{i}^{-}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT for all i𝑖iitalic_i. The main argument is that since the Gaussian density does not change by much inside Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the contribution to the moment integral from the sub-intervals Ai+superscriptsubscript𝐴𝑖A_{i}^{+}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and Aisuperscriptsubscript𝐴𝑖A_{i}^{-}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT must approximately adhere to the ratio of the sub-intervals’ lengths, i.e.,

Ai+ztϕ(z)dzAiztϕ(z)dz=δ1δ(1+ξi)subscriptsuperscriptsubscript𝐴𝑖superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧subscriptsuperscriptsubscript𝐴𝑖superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧𝛿1𝛿1subscript𝜉𝑖\displaystyle\frac{\int_{A_{i}^{+}}z^{t}\phi(z)\mathrm{d}z}{\int_{A_{i}^{-}}z^% {t}\phi(z)\mathrm{d}z}=\frac{\delta}{1-\delta}(1+\xi_{i})divide start_ARG ∫ start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z end_ARG start_ARG ∫ start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z end_ARG = divide start_ARG italic_δ end_ARG start_ARG 1 - italic_δ end_ARG ( 1 + italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (9)

for some small ξisubscript𝜉𝑖\xi_{i}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. In fact, we can show it for |ξi|=O(is2)subscript𝜉𝑖𝑂𝑖superscript𝑠2|\xi_{i}|=O(is^{2})| italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | = italic_O ( italic_i italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) using a polynomial approximation for the density function ϕ(z)italic-ϕ𝑧\phi(z)italic_ϕ ( italic_z ). The important part is that we can control |ξi|subscript𝜉𝑖|\xi_{i}|| italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | using the step size s𝑠sitalic_s. Finally,

|𝐄z𝒩(0,1)[gη(z)zt]|subscript𝐄similar-to𝑧𝒩01subscript𝑔𝜂𝑧superscript𝑧𝑡\displaystyle\left|\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[g_{\eta}% (z)z^{t}]\right|| bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] | =|izAigη(z)ztϕ(z)dz|i|(1δ)zAi+ztϕ(z)dzδzAiztϕ(z)dz|absentsubscript𝑖subscript𝑧subscript𝐴𝑖subscript𝑔𝜂𝑧superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧subscript𝑖1𝛿subscript𝑧superscriptsubscript𝐴𝑖superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧𝛿subscript𝑧superscriptsubscript𝐴𝑖superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧\displaystyle=\left|\sum_{i\in\mathbb{Z}}\int_{z\in A_{i}}g_{\eta}(z)z^{t}\phi% (z)\mathrm{d}z\right|\leq\sum_{i\in\mathbb{Z}}\left|(1-\delta)\int_{z\in A_{i}% ^{+}}z^{t}\phi(z)\mathrm{d}z-\delta\int_{z\in A_{i}^{-}}z^{t}\phi(z)\mathrm{d}% z\right|= | ∑ start_POSTSUBSCRIPT italic_i ∈ blackboard_Z end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_z ∈ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z | ≤ ∑ start_POSTSUBSCRIPT italic_i ∈ blackboard_Z end_POSTSUBSCRIPT | ( 1 - italic_δ ) ∫ start_POSTSUBSCRIPT italic_z ∈ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z - italic_δ ∫ start_POSTSUBSCRIPT italic_z ∈ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z |
i|ξiδzAif(z)ztϕ(z)dz|<η.absentsubscript𝑖subscript𝜉𝑖𝛿subscript𝑧superscriptsubscript𝐴𝑖𝑓𝑧superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧𝜂\displaystyle\leq\sum_{i\in\mathbb{Z}}\left|\xi_{i}\,\delta\int_{z\in A_{i}^{-% }}f(z)z^{t}\phi(z)\mathrm{d}z\right|<\eta\;.≤ ∑ start_POSTSUBSCRIPT italic_i ∈ blackboard_Z end_POSTSUBSCRIPT | italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_δ ∫ start_POSTSUBSCRIPT italic_z ∈ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z | < italic_η . (using (9))

The last step above amounts to a sufficiently small s𝑠sitalic_s so that ξisubscript𝜉𝑖\xi_{i}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT becomes sufficiently small and makes the entire right hand side less than η𝜂\etaitalic_η. There are additional details needed to formalize this, such as noting that the summation does not need to cover the entire range of i𝑖i\in\mathbb{Z}italic_i ∈ blackboard_Z. We defer these details to Section 4.1.

The final step is to reduce the number of pieces from (klog(1/η))2k/ηsuperscript𝑘1𝜂2𝑘𝜂(k\log(1/\eta))^{2k}/\eta( italic_k roman_log ( 1 / italic_η ) ) start_POSTSUPERSCRIPT 2 italic_k end_POSTSUPERSCRIPT / italic_η down to k𝑘kitalic_k. To this end, we use the proposition below which shows that we can start with a t>k𝑡𝑘t>kitalic_t > italic_k piecewise constant function and decrease the number of pieces to k𝑘kitalic_k without changing the desired properties of the function. An analogous statement was shown in [DKZ20]; here we require a generalization of this for all continuous distributions and any sequence of moments. The main idea of the proof is to model the endpoints of the intervals as a differential equation. To do so, we start with an instance that has many more endpoints than our goal, i.e., the instance has t𝑡titalic_t endpoints, and the first k𝑘kitalic_k moments of this distribution have specific values. One can model this as a vector-valued function 𝐌(z1,,zt):tk:𝐌subscript𝑧1subscript𝑧𝑡maps-tosuperscript𝑡superscript𝑘\mathbf{M}(z_{1},\ldots,z_{t}):\mathbb{R}^{t}\mapsto\mathbb{R}^{k}bold_M ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) : blackboard_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ↦ blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, where z1,,ztsubscript𝑧1subscript𝑧𝑡z_{1},\ldots,z_{t}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are the endpoints and 𝐌isubscript𝐌𝑖\mathbf{M}_{i}bold_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the value of the i𝑖iitalic_i-th moment. Our task is to move the endpoints zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT until two of them coincide or one of them goes to infinity, while kee** the vector 𝐌𝐌\mathbf{M}bold_M constant (so that the moments will continue to satisfy our assumptions). This is achieved by finding a specific 𝐮(z):t:𝐮𝑧maps-tosuperscript𝑡\mathbf{u}(z):\mathbb{R}\mapsto\mathbb{R}^{t}bold_u ( italic_z ) : blackboard_R ↦ blackboard_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT with the properties that 𝐮(0)=[z1,,zt]𝐮0subscript𝑧1subscript𝑧𝑡\mathbf{u}(0)=[z_{1},\ldots,z_{t}]bold_u ( 0 ) = [ italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] (so that the initial conditions satisfy our moment assumptions), d𝐌(𝐮1(z),,𝐮t(z))/dz=𝟎d𝐌subscript𝐮1𝑧subscript𝐮𝑡𝑧d𝑧0\mathrm{d}\mathbf{M}(\mathbf{u}_{1}(z),\ldots,\mathbf{u}_{t}(z))/\mathrm{d}z=% \mathbf{0}roman_d bold_M ( bold_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_z ) , … , bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_z ) ) / roman_d italic_z = bold_0 (so that the moments remain constant), and d𝐮t(z)/dz=1dsubscript𝐮𝑡𝑧d𝑧1\mathrm{d}\mathbf{u}_{t}(z)/\mathrm{d}z=1roman_d bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_z ) / roman_d italic_z = 1 (so that at least one endpoint will be removed, i.e., in the worst case the t𝑡titalic_t-th endpoint goes to infinity). One can show that such a function 𝐮𝐮\mathbf{u}bold_u always exists, as long as t>k+1𝑡𝑘1t>k+1italic_t > italic_k + 1. For completeness, we provide a proof in Appendix B.1.

Proposition 4.3.

Let k,𝑘normal-ℓk,\ellitalic_k , roman_ℓ be positive integers with k+1normal-ℓ𝑘1\ell\geq k+1roman_ℓ ≥ italic_k + 1 and a,b𝑎𝑏a,b\in\mathbb{R}italic_a , italic_b ∈ blackboard_R with b>a𝑏𝑎b>aitalic_b > italic_a. Let D𝐷Ditalic_D be a continuous distribution over \mathbb{R}blackboard_R and let ν0,,νk1subscript𝜈0normal-…subscript𝜈𝑘1\nu_{0},\ldots,\nu_{k-1}\in\mathbb{R}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_ν start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∈ blackboard_R. If for any η>0𝜂0\eta>0italic_η > 0 there exists an at most normal-ℓ\ellroman_ℓ-piecewise constant function gη:{a,b}normal-:subscript𝑔𝜂normal-→𝑎𝑏g_{\eta}:\mathbb{R}\to\{a,b\}italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT : blackboard_R → { italic_a , italic_b } such that |𝐄zD[gη(z)zt]νt|ηsubscript𝐄similar-to𝑧𝐷subscript𝑔𝜂𝑧superscript𝑧𝑡subscript𝜈𝑡𝜂|\operatorname*{\mathbf{E}}_{z\sim D}[g_{\eta}(z)z^{t}]-\nu_{t}|\leq\eta| bold_E start_POSTSUBSCRIPT italic_z ∼ italic_D end_POSTSUBSCRIPT [ italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] - italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ≤ italic_η for every non-negative integer t<k𝑡𝑘t<kitalic_t < italic_k, then there exists an at most (k+1)𝑘1(k+1)( italic_k + 1 )-piecewise constant function f:{a,b}normal-:𝑓normal-→𝑎𝑏f:\mathbb{R}\to\{a,b\}italic_f : blackboard_R → { italic_a , italic_b } such that 𝐄zD[f(z)zt]=νtsubscript𝐄similar-to𝑧𝐷𝑓𝑧superscript𝑧𝑡subscript𝜈𝑡\operatorname*{\mathbf{E}}_{z\sim D}[f(z)z^{t}]=\nu_{t}bold_E start_POSTSUBSCRIPT italic_z ∼ italic_D end_POSTSUBSCRIPT [ italic_f ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, for every non-negative integer t<k𝑡𝑘t<kitalic_t < italic_k.

Having the above at hand, the proof of Lemma 3.3 follows from Claim 4.2 and Proposition 4.3 applied with a=δ𝑎𝛿a=-\deltaitalic_a = - italic_δ, b=1+δ𝑏1𝛿b=1+\deltaitalic_b = 1 + italic_δ, D=𝒩(0,1)𝐷𝒩01D=\mathcal{N}(0,1)italic_D = caligraphic_N ( 0 , 1 ). The set U𝑈Uitalic_U that satisfies the conclusion of Lemma 3.3 is the set of intervals on which f(z)>0𝑓𝑧0f(z)>0italic_f ( italic_z ) > 0.

4.1 Proof of Claim 4.2

Let C𝐶Citalic_C be a sufficiently large absolute constant. Fix the parameters s:=0.01η/(klog(1/η))kassign𝑠0.01𝜂superscript𝑘1𝜂𝑘s:=0.01\eta/(k\log(1/\eta))^{k}italic_s := 0.01 italic_η / ( italic_k roman_log ( 1 / italic_η ) ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, imax=10logk(1/η)/ssubscript𝑖10superscript𝑘1𝜂𝑠i_{\max}=10\log^{k}(1/\eta)/sitalic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 10 roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 / italic_η ) / italic_s throughout the proof. We also define U+superscript𝑈U^{+}italic_U start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and Usuperscript𝑈U^{-}italic_U start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT to be the unions of intervals in the positive and negative part of the real line as shown below. We define them so that their union U=U+U𝑈superscript𝑈superscript𝑈U=U^{+}\cup U^{-}italic_U = italic_U start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ∪ italic_U start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT is symmetric around zero:

U+superscript𝑈\displaystyle U^{+}italic_U start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT :=(i=0imax1[is,(i+δ)s])[imaxs,+),assignabsentsuperscriptsubscript𝑖0subscript𝑖1𝑖𝑠𝑖𝛿𝑠subscript𝑖𝑠\displaystyle:=\left(\bigcup_{i=0}^{i_{\max}-1}[is,(i+\delta)s]\right)\cup[i_{% \max}s,+\infty)\;,:= ( ⋃ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT [ italic_i italic_s , ( italic_i + italic_δ ) italic_s ] ) ∪ [ italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s , + ∞ ) ,
Usuperscript𝑈\displaystyle U^{-}italic_U start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT :=(i=0imax1[(i+δ)s,is])[,imaxs).assignabsentsuperscriptsubscript𝑖0subscript𝑖1𝑖𝛿𝑠𝑖𝑠subscript𝑖𝑠\displaystyle:=\left(\bigcup_{i=0}^{i_{\max}-1}[-(i+\delta)s,-is]\right)\cup[-% \infty,-i_{\max}s)\;.:= ( ⋃ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT [ - ( italic_i + italic_δ ) italic_s , - italic_i italic_s ] ) ∪ [ - ∞ , - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s ) .

Finally define the piecewise constant function

gη(z):={1δ,zUδ,zU.\displaystyle g_{\eta}(z):=\begin{cases}1-\delta&,z\in U\\ -\delta&,z\not\in U\;.\end{cases}italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) := { start_ROW start_CELL 1 - italic_δ end_CELL start_CELL , italic_z ∈ italic_U end_CELL end_ROW start_ROW start_CELL - italic_δ end_CELL start_CELL , italic_z ∉ italic_U . end_CELL end_ROW

First, we note that because of symmetry of gη(z)subscript𝑔𝜂𝑧g_{\eta}(z)italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) around zero

|𝐄z𝒩(0,1)[gη(z)zt]||0gη(z)ztϕ(z)dz|+|0+gη(z)ztϕ(z)dz|=2|0+gη(z)ztϕ(z)dz|.subscript𝐄similar-to𝑧𝒩01subscript𝑔𝜂𝑧superscript𝑧𝑡superscriptsubscript0subscript𝑔𝜂𝑧superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧superscriptsubscript0subscript𝑔𝜂𝑧superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧2superscriptsubscript0subscript𝑔𝜂𝑧superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧\displaystyle\left|\operatorname*{\mathbf{E}}_{z\sim\mathcal{N}(0,1)}[g_{\eta}% (z)z^{t}]\right|\leq\left|\int_{-\infty}^{0}g_{\eta}(z)z^{t}\phi(z)\mathrm{d}z% \right|+\left|\int_{0}^{+\infty}g_{\eta}(z)z^{t}\phi(z)\mathrm{d}z\right|=2% \left|\int_{0}^{+\infty}g_{\eta}(z)z^{t}\phi(z)\mathrm{d}z\right|\;.| bold_E start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] | ≤ | ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z | + | ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z | = 2 | ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z | . (10)

Therefore, in everything that follows, it suffices to only consider the integral on the positive part of the real line.

Our goal is to bound (10) by η𝜂\etaitalic_η. As a first step, we need the following bound on the ratio of consecutive pieces of the moment integral:

is(i+δ)sztϕ(z)dz(i+δ)s(i+1)sztϕ(z)dzδs((i+δ)s)tϕ(is)(1δ)s((i+δ)s)tϕ((i+1)s)δ1δeis2+s2/2δ1δ(1+2is2),superscriptsubscript𝑖𝑠𝑖𝛿𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧superscriptsubscript𝑖𝛿𝑠𝑖1𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧𝛿𝑠superscript𝑖𝛿𝑠𝑡italic-ϕ𝑖𝑠1𝛿𝑠superscript𝑖𝛿𝑠𝑡italic-ϕ𝑖1𝑠𝛿1𝛿superscript𝑒𝑖superscript𝑠2superscript𝑠22𝛿1𝛿12𝑖superscript𝑠2\displaystyle\frac{\int_{is}^{(i+\delta)s}z^{t}\phi(z)\mathrm{d}z}{\int_{(i+% \delta)s}^{(i+1)s}z^{t}\phi(z)\mathrm{d}z}\leq\frac{\delta s((i+\delta)s)^{t}% \phi(is)}{(1-\delta)s((i+\delta)s)^{t}\phi((i+1)s)}\leq\frac{\delta}{1-\delta}% e^{is^{2}+s^{2}/2}\leq\frac{\delta}{1-\delta}(1+2is^{2})\;,divide start_ARG ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + italic_δ ) italic_s end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z end_ARG start_ARG ∫ start_POSTSUBSCRIPT ( italic_i + italic_δ ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z end_ARG ≤ divide start_ARG italic_δ italic_s ( ( italic_i + italic_δ ) italic_s ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_i italic_s ) end_ARG start_ARG ( 1 - italic_δ ) italic_s ( ( italic_i + italic_δ ) italic_s ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( ( italic_i + 1 ) italic_s ) end_ARG ≤ divide start_ARG italic_δ end_ARG start_ARG 1 - italic_δ end_ARG italic_e start_POSTSUPERSCRIPT italic_i italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 end_POSTSUPERSCRIPT ≤ divide start_ARG italic_δ end_ARG start_ARG 1 - italic_δ end_ARG ( 1 + 2 italic_i italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (11)

where we used the minimum and maximum values that the ztϕ(z)superscript𝑧𝑡italic-ϕ𝑧z^{t}\phi(z)italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) takes in each integral, and then used that 1+xex1+1.1x1𝑥superscript𝑒𝑥11.1𝑥1+x\leq e^{x}\leq 1+1.1x1 + italic_x ≤ italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT ≤ 1 + 1.1 italic_x for all x<0.1𝑥0.1x<0.1italic_x < 0.1, where we applied this with x=is2imaxs2𝑥𝑖superscript𝑠2subscript𝑖superscript𝑠2x=is^{2}\leq i_{\max}s^{2}italic_x = italic_i italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT which is indeed less than 0.10.10.10.1 for our choice of s,imax𝑠subscript𝑖s,i_{\max}italic_s , italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT.

We can now proceed to bound (10). We start with the upper bound; see below for step by step explanations:

0+superscriptsubscript0\displaystyle\int_{0}^{+\infty}∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT gη(z)ztϕ(z)dzsubscript𝑔𝜂𝑧superscript𝑧𝑡italic-ϕ𝑧d𝑧\displaystyle g_{\eta}(z)z^{t}\phi(z)\mathrm{d}zitalic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z
=imaxs+ztϕ(z)dz+i=0imax1is(i+1)sgη(z)ztϕ(z)dzabsentsuperscriptsubscriptsubscript𝑖𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧superscriptsubscript𝑖0subscript𝑖1superscriptsubscript𝑖𝑠𝑖1𝑠subscript𝑔𝜂𝑧superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧\displaystyle=\int_{i_{\max}s}^{+\infty}z^{t}\phi(z)\mathrm{d}z+\sum_{i=0}^{i_% {\max}-1}\int_{is}^{(i+1)s}g_{\eta}(z)z^{t}\phi(z)\mathrm{d}z= ∫ start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z + ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z
η4+i=0imax1{(1δ)is(i+δ)sztϕ(z)dzδ(i+δ)s(i+1)sztϕ(z)dz}absent𝜂4superscriptsubscript𝑖0subscript𝑖11𝛿superscriptsubscript𝑖𝑠𝑖𝛿𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧𝛿superscriptsubscript𝑖𝛿𝑠𝑖1𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧\displaystyle\leq\frac{\eta}{4}+\sum_{i=0}^{i_{\max}-1}\left\{(1-\delta)\int_{% is}^{(i+\delta)s}z^{t}\phi(z)\mathrm{d}z-\delta\int_{(i+\delta)s}^{(i+1)s}z^{t% }\phi(z)\mathrm{d}z\right\}≤ divide start_ARG italic_η end_ARG start_ARG 4 end_ARG + ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT { ( 1 - italic_δ ) ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + italic_δ ) italic_s end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z - italic_δ ∫ start_POSTSUBSCRIPT ( italic_i + italic_δ ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z } (12)
=η4+i=0imax1{(1δ)δ1δ(1+2is2)(i+δ)s(i+1)sztϕ(z)dzδ(i+δ)s(i+1)sztϕ(z)dz}absent𝜂4superscriptsubscript𝑖0subscript𝑖11𝛿𝛿1𝛿12𝑖superscript𝑠2superscriptsubscript𝑖𝛿𝑠𝑖1𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧𝛿superscriptsubscript𝑖𝛿𝑠𝑖1𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧\displaystyle=\frac{\eta}{4}+\sum_{i=0}^{i_{\max}-1}\left\{(1-\delta)\frac{% \delta}{1-\delta}(1+2is^{2})\int_{(i+\delta)s}^{(i+1)s}z^{t}\phi(z)\mathrm{d}z% -\delta\int_{(i+\delta)s}^{(i+1)s}z^{t}\phi(z)\mathrm{d}z\right\}= divide start_ARG italic_η end_ARG start_ARG 4 end_ARG + ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT { ( 1 - italic_δ ) divide start_ARG italic_δ end_ARG start_ARG 1 - italic_δ end_ARG ( 1 + 2 italic_i italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∫ start_POSTSUBSCRIPT ( italic_i + italic_δ ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z - italic_δ ∫ start_POSTSUBSCRIPT ( italic_i + italic_δ ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z } (13)
=η4+i=0imax1{2iδs2(i+δ)s(i+1)sztϕ(z)dz}absent𝜂4superscriptsubscript𝑖0subscript𝑖12𝑖𝛿superscript𝑠2superscriptsubscript𝑖𝛿𝑠𝑖1𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧\displaystyle=\frac{\eta}{4}+\sum_{i=0}^{i_{\max}-1}\left\{2i\delta s^{2}\int_% {(i+\delta)s}^{(i+1)s}z^{t}\phi(z)\mathrm{d}z\right\}= divide start_ARG italic_η end_ARG start_ARG 4 end_ARG + ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT { 2 italic_i italic_δ italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT ( italic_i + italic_δ ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z }
η4+20slogk(1/η)i=0imax1(i+δ)s(i+1)sztϕ(z)dzabsent𝜂420𝑠superscript𝑘1𝜂superscriptsubscript𝑖0subscript𝑖1superscriptsubscript𝑖𝛿𝑠𝑖1𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧\displaystyle\leq\frac{\eta}{4}+20s\log^{k}(1/\eta)\sum_{i=0}^{i_{\max}-1}\int% _{(i+\delta)s}^{(i+1)s}z^{t}\phi(z)\mathrm{d}z≤ divide start_ARG italic_η end_ARG start_ARG 4 end_ARG + 20 italic_s roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 / italic_η ) ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT ( italic_i + italic_δ ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z (14)
η4+20δslogk(1/η)0+ztϕ(z)dzabsent𝜂420𝛿𝑠superscript𝑘1𝜂superscriptsubscript0superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧\displaystyle\leq\frac{\eta}{4}+20\delta s\log^{k}(1/\eta)\int_{0}^{+\infty}z^% {t}\phi(z)\mathrm{d}z≤ divide start_ARG italic_η end_ARG start_ARG 4 end_ARG + 20 italic_δ italic_s roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 / italic_η ) ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z (15)
η4+20slogk(1/η)(t1)!!absent𝜂420𝑠superscript𝑘1𝜂double-factorial𝑡1\displaystyle\leq\frac{\eta}{4}+20s\log^{k}(1/\eta)(t-1)!!≤ divide start_ARG italic_η end_ARG start_ARG 4 end_ARG + 20 italic_s roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 / italic_η ) ( italic_t - 1 ) !! (16)
η4+η4=η2.absent𝜂4𝜂4𝜂2\displaystyle\leq\frac{\eta}{4}+\frac{\eta}{4}=\frac{\eta}{2}\;.≤ divide start_ARG italic_η end_ARG start_ARG 4 end_ARG + divide start_ARG italic_η end_ARG start_ARG 4 end_ARG = divide start_ARG italic_η end_ARG start_ARG 2 end_ARG . (17)

We now justify each step in the above derivations. (12) uses the Gaussian concentration inequality Pr[zt>β]eβ2/t/2Prsuperscript𝑧𝑡𝛽superscript𝑒superscript𝛽2𝑡2\Pr[z^{t}>\beta]\leq e^{-\beta^{2/t}/2}roman_Pr [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT > italic_β ] ≤ italic_e start_POSTSUPERSCRIPT - italic_β start_POSTSUPERSCRIPT 2 / italic_t end_POSTSUPERSCRIPT / 2 end_POSTSUPERSCRIPT for β=imaxs=10logk(1/η)𝛽subscript𝑖𝑠10superscript𝑘1𝜂\beta=i_{\max}s=10\log^{k}(1/\eta)italic_β = italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s = 10 roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 / italic_η ). (13) holds because of (11). (14) follows because is10logk(1/η)𝑖𝑠10superscript𝑘1𝜂is\leq 10\log^{k}(1/\eta)italic_i italic_s ≤ 10 roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 / italic_η ). (16) uses the Gaussian moment bound. (17) holds because (t1)!!ttkkdouble-factorial𝑡1superscript𝑡𝑡superscript𝑘𝑘(t-1)!!\leq t^{t}\leq k^{k}( italic_t - 1 ) !! ≤ italic_t start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≤ italic_k start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and the choice s:=0.01η/(kklogk(1/η))assign𝑠0.01𝜂superscript𝑘𝑘superscript𝑘1𝜂s:=0.01\eta/(k^{k}\log^{k}(1/\eta))italic_s := 0.01 italic_η / ( italic_k start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 / italic_η ) ).

The other direction, i.e., 0+gη(z)ztϕ(z)dzη/2superscriptsubscript0subscript𝑔𝜂𝑧superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧𝜂2\int_{0}^{+\infty}g_{\eta}(z)z^{t}\phi(z)\mathrm{d}z\geq-\eta/2∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z ≥ - italic_η / 2, can be shown with a similar argument:

0+gη(z)ztϕ(z)dzsuperscriptsubscript0subscript𝑔𝜂𝑧superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧\displaystyle\int_{0}^{+\infty}g_{\eta}(z)z^{t}\phi(z)\mathrm{d}z∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z i=0imax1{δ(i+δ)s(i+1)sztϕ(z)dz+(1δ)(i+1)s(i+1+δ)sztϕ(z)dz}absentsuperscriptsubscript𝑖0subscript𝑖1𝛿superscriptsubscript𝑖𝛿𝑠𝑖1𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧1𝛿superscriptsubscript𝑖1𝑠𝑖1𝛿𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧\displaystyle\geq\sum_{i=0}^{i_{\max}-1}\left\{-\delta\int_{(i+\delta)s}^{(i+1% )s}z^{t}\phi(z)\mathrm{d}z+(1-\delta)\int_{(i+1)s}^{(i+1+\delta)s}z^{t}\phi(z)% \mathrm{d}z\right\}≥ ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT { - italic_δ ∫ start_POSTSUBSCRIPT ( italic_i + italic_δ ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z + ( 1 - italic_δ ) ∫ start_POSTSUBSCRIPT ( italic_i + 1 ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 + italic_δ ) italic_s end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z }
i=0imax1{δ(1δδ)(1+2is2)+1δ}(i+1)s(i+1+δ)sztϕ(z)dzabsentsuperscriptsubscript𝑖0subscript𝑖1𝛿1𝛿𝛿12𝑖superscript𝑠21𝛿superscriptsubscript𝑖1𝑠𝑖1𝛿𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧\displaystyle\geq\sum_{i=0}^{i_{\max}-1}\left\{-\delta\left(\frac{1-\delta}{% \delta}\right)(1+2is^{2})+1-\delta\right\}\int_{(i+1)s}^{(i+1+\delta)s}z^{t}% \phi(z)\mathrm{d}z≥ ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT { - italic_δ ( divide start_ARG 1 - italic_δ end_ARG start_ARG italic_δ end_ARG ) ( 1 + 2 italic_i italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + 1 - italic_δ } ∫ start_POSTSUBSCRIPT ( italic_i + 1 ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 + italic_δ ) italic_s end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z
i=0imax12(1δ)is2(i+1)s(i+1+δ)sztϕ(z)dz20slogk(1/η)(t1)!!η4,absentsuperscriptsubscript𝑖0subscript𝑖121𝛿𝑖superscript𝑠2superscriptsubscript𝑖1𝑠𝑖1𝛿𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧20𝑠superscript𝑘1𝜂double-factorial𝑡1𝜂4\displaystyle\geq-\sum_{i=0}^{i_{\max}-1}2(1-\delta)is^{2}\int_{(i+1)s}^{(i+1+% \delta)s}z^{t}\phi(z)\mathrm{d}z\geq-20s\log^{k}(1/\eta)(t-1)!!\geq-\frac{\eta% }{4}\;,≥ - ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT 2 ( 1 - italic_δ ) italic_i italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT ( italic_i + 1 ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 + italic_δ ) italic_s end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z ≥ - 20 italic_s roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 / italic_η ) ( italic_t - 1 ) !! ≥ - divide start_ARG italic_η end_ARG start_ARG 4 end_ARG ,

where instead of (11) we used the bound (i+δ)s(i+1)sztϕ(z)dz1δδ(1+2is2)(i+1)s(i+1+δ)sztϕ(z)dzsuperscriptsubscript𝑖𝛿𝑠𝑖1𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧1𝛿𝛿12𝑖superscript𝑠2superscriptsubscript𝑖1𝑠𝑖1𝛿𝑠superscript𝑧𝑡italic-ϕ𝑧differential-d𝑧\int_{(i+\delta)s}^{(i+1)s}z^{t}\phi(z)\mathrm{d}z\leq\frac{1-\delta}{\delta}(% 1+2is^{2})\int_{(i+1)s}^{(i+1+\delta)s}z^{t}\phi(z)\mathrm{d}z∫ start_POSTSUBSCRIPT ( italic_i + italic_δ ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z ≤ divide start_ARG 1 - italic_δ end_ARG start_ARG italic_δ end_ARG ( 1 + 2 italic_i italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∫ start_POSTSUBSCRIPT ( italic_i + 1 ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 + italic_δ ) italic_s end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z, which can be shown in a similar manner.

We finally calculate the Pr[zU]Pr𝑧𝑈\Pr[z\in U]roman_Pr [ italic_z ∈ italic_U ] (which is the same as Pr[gη(z)=1δ]Prsubscript𝑔𝜂𝑧1𝛿\Pr[g_{\eta}(z)=1-\delta]roman_Pr [ italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) = 1 - italic_δ ]), as follows

Pr[zU]Pr𝑧𝑈\displaystyle\Pr[z\in U]roman_Pr [ italic_z ∈ italic_U ] 2i=0imax1is(i+δ)sϕ(z)dz=2i=0imax1δ(1+2is2)(i+1)s(i+2)sϕ(z)dzabsent2superscriptsubscript𝑖0subscript𝑖1superscriptsubscript𝑖𝑠𝑖𝛿𝑠italic-ϕ𝑧differential-d𝑧2superscriptsubscript𝑖0subscript𝑖1𝛿12𝑖superscript𝑠2superscriptsubscript𝑖1𝑠𝑖2𝑠italic-ϕ𝑧differential-d𝑧\displaystyle\leq 2\sum_{i=0}^{i_{\max}-1}\int_{is}^{(i+\delta)s}\phi(z)% \mathrm{d}z=2\sum_{i=0}^{i_{\max}-1}\delta(1+2is^{2})\int_{(i+1)s}^{(i+2)s}% \phi(z)\mathrm{d}z≤ 2 ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + italic_δ ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z = 2 ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_δ ( 1 + 2 italic_i italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∫ start_POSTSUBSCRIPT ( italic_i + 1 ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 2 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z
=2δi=0imax1(i+1)s(i+2)sϕ(z)dz+4δi=0imax1is2(i+1)s(i+2)sϕ(z)dzabsent2𝛿superscriptsubscript𝑖0subscript𝑖1superscriptsubscript𝑖1𝑠𝑖2𝑠italic-ϕ𝑧differential-d𝑧4𝛿superscriptsubscript𝑖0subscript𝑖1𝑖superscript𝑠2superscriptsubscript𝑖1𝑠𝑖2𝑠italic-ϕ𝑧differential-d𝑧\displaystyle=2\delta\sum_{i=0}^{i_{\max}-1}\int_{(i+1)s}^{(i+2)s}\phi(z)% \mathrm{d}z+4\delta\sum_{i=0}^{i_{\max}-1}is^{2}\int_{(i+1)s}^{(i+2)s}\phi(z)% \mathrm{d}z= 2 italic_δ ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT ( italic_i + 1 ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 2 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z + 4 italic_δ ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_i italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT ( italic_i + 1 ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 2 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z
2δi=0imax1(i+1)s(i+2)sϕ(z)dz+40slogk(1/η)i=0imax1(i+1)s(i+2)sϕ(z)dzabsent2𝛿superscriptsubscript𝑖0subscript𝑖1superscriptsubscript𝑖1𝑠𝑖2𝑠italic-ϕ𝑧differential-d𝑧40𝑠superscript𝑘1𝜂superscriptsubscript𝑖0subscript𝑖1superscriptsubscript𝑖1𝑠𝑖2𝑠italic-ϕ𝑧differential-d𝑧\displaystyle\leq 2\delta\sum_{i=0}^{i_{\max}-1}\int_{(i+1)s}^{(i+2)s}\phi(z)% \mathrm{d}z+40s\log^{k}(1/\eta)\sum_{i=0}^{i_{\max}-1}\int_{(i+1)s}^{(i+2)s}% \phi(z)\mathrm{d}z≤ 2 italic_δ ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT ( italic_i + 1 ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 2 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z + 40 italic_s roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 / italic_η ) ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT ( italic_i + 1 ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 2 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z
2δ12+20slogk(1/η)12δ+η,absent2𝛿1220𝑠superscript𝑘1𝜂12𝛿𝜂\displaystyle\leq 2\delta\frac{1}{2}+20s\log^{k}(1/\eta)\frac{1}{2}\leq\delta+% \eta\;,≤ 2 italic_δ divide start_ARG 1 end_ARG start_ARG 2 end_ARG + 20 italic_s roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 / italic_η ) divide start_ARG 1 end_ARG start_ARG 2 end_ARG ≤ italic_δ + italic_η ,

where the first line uses a ratio bound similar to (11), the third line uses that is10logk(1/η)𝑖𝑠10superscript𝑘1𝜂is\leq 10\log^{k}(1/\eta)italic_i italic_s ≤ 10 roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 / italic_η ), and the last line uses that i(i+1)s(i+2)sϕ(z)dz1/2subscript𝑖superscriptsubscript𝑖1𝑠𝑖2𝑠italic-ϕ𝑧differential-d𝑧12\sum_{i}\int_{(i+1)s}^{(i+2)s}\phi(z)\mathrm{d}z\leq 1/2∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT ( italic_i + 1 ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 2 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_z ) roman_d italic_z ≤ 1 / 2 and that s:=0.01η/(kklogk(1/η))assign𝑠0.01𝜂superscript𝑘𝑘superscript𝑘1𝜂s:=0.01\eta/(k^{k}\log^{k}(1/\eta))italic_s := 0.01 italic_η / ( italic_k start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( 1 / italic_η ) ).

Similarly, it can be shown that Pr[zU]δηPr𝑧𝑈𝛿𝜂\Pr[z\in U]\geq\delta-\etaroman_Pr [ italic_z ∈ italic_U ] ≥ italic_δ - italic_η, which completes the proof. ∎

5 Proof of Lemma 3.4

The high-level approach for proving Lemma 3.4 is to first show a relaxed version of the statement, where the “hard set T𝑇Titalic_T” is replaced by a “soft set f𝑓fitalic_f” which is a function f:[0,1]:𝑓01f:\mathbb{R}\to[0,1]italic_f : blackboard_R → [ 0 , 1 ]. That is, define the distribution

Pf(x)=ϕ(xε)(1δf(x))Z1forZ:=+ϕ(xε)(1δf(x))dx.subscript𝑃𝑓𝑥italic-ϕ𝑥𝜀1𝛿𝑓𝑥superscript𝑍1for𝑍assignsuperscriptsubscriptitalic-ϕ𝑥𝜀1𝛿𝑓𝑥differential-d𝑥\displaystyle P_{f}(x)=\phi(x-\varepsilon)(1-\delta f(x))Z^{-1}\;\;\text{for}% \;\;Z:=\int_{-\infty}^{+\infty}\phi(x-\varepsilon)(1-\delta f(x))\mathrm{d}x\;.italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x ) = italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ italic_f ( italic_x ) ) italic_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT for italic_Z := ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ italic_f ( italic_x ) ) roman_d italic_x . (18)

We seek to find an f:[0,1]:𝑓01f:\mathbb{R}\to[0,1]italic_f : blackboard_R → [ 0 , 1 ] satisfying the following two constraints:

  1. 1.

    (Moment matching) 𝐄xPf[xt]=𝐄x𝒩(0,1)[xt]subscript𝐄similar-to𝑥subscript𝑃𝑓superscript𝑥𝑡subscript𝐄similar-to𝑥𝒩01superscript𝑥𝑡\operatorname*{\mathbf{E}}_{x\sim P_{f}}[x^{t}]=\operatorname*{\mathbf{E}}_{x% \sim\mathcal{N}(0,1)}[x^{t}]bold_E start_POSTSUBSCRIPT italic_x ∼ italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = bold_E start_POSTSUBSCRIPT italic_x ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ], and

  2. 2.

    (f𝑓fitalic_f has small mass) Z=1δε0.3𝑍1𝛿superscript𝜀0.3Z=1-\delta\varepsilon^{0.3}italic_Z = 1 - italic_δ italic_ε start_POSTSUPERSCRIPT 0.3 end_POSTSUPERSCRIPT

Note that this is indeed a relaxed version of the statement of Lemma 3.4 which results by replacing the 𝟙(xT)1𝑥𝑇\mathds{1}(x\in T)blackboard_1 ( italic_x ∈ italic_T ) by f(x)𝑓𝑥f(x)italic_f ( italic_x ): the first constraint above is the relaxed version of (1) and the second constraint is equivalent to 𝐄x𝒩(ε,1)[f(x)]ε0.3subscript𝐄similar-to𝑥𝒩𝜀1𝑓𝑥superscript𝜀0.3\operatorname*{\mathbf{E}}_{x\sim\mathcal{N}(\varepsilon,1)}[f(x)]\leq% \varepsilon^{0.3}bold_E start_POSTSUBSCRIPT italic_x ∼ caligraphic_N ( italic_ε , 1 ) end_POSTSUBSCRIPT [ italic_f ( italic_x ) ] ≤ italic_ε start_POSTSUPERSCRIPT 0.3 end_POSTSUPERSCRIPT, which is the relaxed version of the constraint Prx𝒩(ε,1)[xT]ε0.3subscriptPrsimilar-to𝑥𝒩𝜀1𝑥𝑇superscript𝜀0.3\Pr_{x\sim\mathcal{N}(\varepsilon,1)}[x\in T]\leq\varepsilon^{0.3}roman_Pr start_POSTSUBSCRIPT italic_x ∼ caligraphic_N ( italic_ε , 1 ) end_POSTSUBSCRIPT [ italic_x ∈ italic_T ] ≤ italic_ε start_POSTSUPERSCRIPT 0.3 end_POSTSUPERSCRIPT appearing in Lemma 3.4. Once we find such an f𝑓fitalic_f, we can convert it to a “hard set” T𝑇Titalic_T which is a union of intervals by using a randomized rounding technique, similar to [DKZ20]. Finally, that technique does not ensure any guarantees on the number of intervals produced, but using Proposition 4.3 as in the previous section, we can bring this number down to k𝑘kitalic_k.

We will prove Item 1 and Item 2 that were listed before in two steps: We will find an f𝑓fitalic_f consisting of two parts f(x)=f1(x)+f2(x)𝑓𝑥subscript𝑓1𝑥subscript𝑓2𝑥f(x)=f_{1}(x)+f_{2}(x)italic_f ( italic_x ) = italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) + italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) with f1(x)[ε,1/2]subscript𝑓1𝑥𝜀12f_{1}(x)\in[\varepsilon,1/2]italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) ∈ [ italic_ε , 1 / 2 ] and f2(x)[ε,ε]subscript𝑓2𝑥𝜀𝜀f_{2}(x)\in[-\varepsilon,\varepsilon]italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) ∈ [ - italic_ε , italic_ε ] (so that overall f(x)[0,1])f(x)\in[0,1])italic_f ( italic_x ) ∈ [ 0 , 1 ] ). For the first part (cf. Claim 5.1), the idea is to start by f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT being the function that would make the distribution Pf1subscript𝑃subscript𝑓1P_{f_{1}}italic_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT (cf. notation of (18)) exactly the same as ϕ(x)italic-ϕ𝑥\phi(x)italic_ϕ ( italic_x ) (the pdf of 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 )), and then clip f1(x)subscript𝑓1𝑥f_{1}(x)italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) so that it only takes values in [ε,1/2]𝜀12[\varepsilon,1/2][ italic_ε , 1 / 2 ]. The important observation is that the clip** only happens for x𝑥xitalic_x with large |x|𝑥|x|| italic_x |. Thus, already Pf1subscript𝑃subscript𝑓1P_{f_{1}}italic_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT is equal to ϕ(x)italic-ϕ𝑥\phi(x)italic_ϕ ( italic_x ) on big part of the real line. The remaining part contributes negligible amount to the moments, thus we can correct the moments by adding a correction term f2(x)subscript𝑓2𝑥f_{2}(x)italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) to f1(x)subscript𝑓1𝑥f_{1}(x)italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ). We find f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT by finding an appropriate polynomial using a technique from [DKS17].

We now implement the two steps of the proof. For the first one, (regarding f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT), we have the following.

Claim 5.1.

Fix δ=ε𝛿𝜀\delta=\sqrt{\varepsilon}italic_δ = square-root start_ARG italic_ε end_ARG. There exists an f1:[ε,1/2]normal-:subscript𝑓1normal-→𝜀12f_{1}:\mathbb{R}\to[\varepsilon,1/2]italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : blackboard_R → [ italic_ε , 1 / 2 ] such that +ϕ(xε)(1δf1(x))dx=1δε0.3superscriptsubscriptitalic-ϕ𝑥𝜀1𝛿subscript𝑓1𝑥differential-d𝑥1𝛿superscript𝜀0.3\int_{-\infty}^{+\infty}\phi(x-\varepsilon)(1-\delta f_{1}(x))\mathrm{d}x=1-% \delta\varepsilon^{0.3}∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) ) roman_d italic_x = 1 - italic_δ italic_ε start_POSTSUPERSCRIPT 0.3 end_POSTSUPERSCRIPT and the distribution with pdf

Pf1(x)=ϕ(xε)(1δf1(x))1δε0.3subscript𝑃subscript𝑓1𝑥italic-ϕ𝑥𝜀1𝛿subscript𝑓1𝑥1𝛿superscript𝜀0.3\displaystyle P_{f_{1}}(x)=\frac{\phi(x-\varepsilon)(1-\delta f_{1}(x))}{1-% \delta\varepsilon^{0.3}}italic_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) ) end_ARG start_ARG 1 - italic_δ italic_ε start_POSTSUPERSCRIPT 0.3 end_POSTSUPERSCRIPT end_ARG

satisfies q(x)=ϕ(x)𝑞𝑥italic-ϕ𝑥q(x)=\phi(x)italic_q ( italic_x ) = italic_ϕ ( italic_x ) for all x𝑥xitalic_x with |x|1/ε2/11𝑥1superscript𝜀211|x|\leq 1/\varepsilon^{2/11}| italic_x | ≤ 1 / italic_ε start_POSTSUPERSCRIPT 2 / 11 end_POSTSUPERSCRIPT.

Proof.

Define ξ:=δε0.3=ε0.8assign𝜉𝛿superscript𝜀0.3superscript𝜀0.8\xi:=\delta\varepsilon^{0.3}=\varepsilon^{0.8}italic_ξ := italic_δ italic_ε start_POSTSUPERSCRIPT 0.3 end_POSTSUPERSCRIPT = italic_ε start_POSTSUPERSCRIPT 0.8 end_POSTSUPERSCRIPT (recalling that δ=ε)\delta=\sqrt{\varepsilon})italic_δ = square-root start_ARG italic_ε end_ARG ). For notational convenience, we will consider the following equivalent statement of our claim: there exists an h:[1δε,1δ/2]:1𝛿𝜀1𝛿2h:\mathbb{R}\to[1-\delta\varepsilon,1-\delta/2]italic_h : blackboard_R → [ 1 - italic_δ italic_ε , 1 - italic_δ / 2 ] such that +ϕ(xε)h(x)dx=1ξ=1δε0.3superscriptsubscriptitalic-ϕ𝑥𝜀𝑥differential-d𝑥1𝜉1𝛿superscript𝜀0.3\int_{-\infty}^{+\infty}\phi(x-\varepsilon)h(x)\mathrm{d}x=1-\xi=1-\delta% \varepsilon^{0.3}∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) italic_h ( italic_x ) roman_d italic_x = 1 - italic_ξ = 1 - italic_δ italic_ε start_POSTSUPERSCRIPT 0.3 end_POSTSUPERSCRIPT and ϕ(xε)h(x)1ξ=ϕ(x)italic-ϕ𝑥𝜀𝑥1𝜉italic-ϕ𝑥\frac{\phi(x-\varepsilon)h(x)}{1-\xi}=\phi(x)divide start_ARG italic_ϕ ( italic_x - italic_ε ) italic_h ( italic_x ) end_ARG start_ARG 1 - italic_ξ end_ARG = italic_ϕ ( italic_x ) for all x𝑥xitalic_x with |x|1/ε2/11𝑥1superscript𝜀211|x|\leq 1/\varepsilon^{2/11}| italic_x | ≤ 1 / italic_ε start_POSTSUPERSCRIPT 2 / 11 end_POSTSUPERSCRIPT; the original statement would follow by this after letting f1(x)=(1h(x))/δsubscript𝑓1𝑥1𝑥𝛿f_{1}(x)=(1-h(x))/\deltaitalic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) = ( 1 - italic_h ( italic_x ) ) / italic_δ).

To show our claim, let us first consider the function h~~\tilde{h}over~ start_ARG italic_h end_ARG, which we define so that

ϕ(xε)h~(x)1ξ=ϕ(x)for all x.italic-ϕ𝑥𝜀~𝑥1𝜉italic-ϕ𝑥for all x\displaystyle\frac{\phi(x-\varepsilon)\tilde{h}(x)}{1-\xi}=\phi(x)\quad\text{% for all $x\in\mathbb{R}$}\;.divide start_ARG italic_ϕ ( italic_x - italic_ε ) over~ start_ARG italic_h end_ARG ( italic_x ) end_ARG start_ARG 1 - italic_ξ end_ARG = italic_ϕ ( italic_x ) for all italic_x ∈ blackboard_R .

That is, we define h~(x):=exp(ε2/2εx)(1ξ)assign~𝑥superscript𝜀22𝜀𝑥1𝜉\tilde{h}(x):=\exp(\varepsilon^{2}/2-\varepsilon x)(1-\xi)over~ start_ARG italic_h end_ARG ( italic_x ) := roman_exp ( italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 - italic_ε italic_x ) ( 1 - italic_ξ ). Then, we define hhitalic_h to the version of h~~\tilde{h}over~ start_ARG italic_h end_ARG which is clipped in the interval [1δε,1δ/2]1𝛿𝜀1𝛿2[1-\delta\varepsilon,1-\delta/2][ 1 - italic_δ italic_ε , 1 - italic_δ / 2 ], i.e.,

h(x):={1δε,if h~(x)>1δεh~(x)if 1δ/2h~(x)1δε1δ/2,if h~(x)<1δ/2.assign𝑥cases1𝛿𝜀if h~(x)>1δε~𝑥if 1δ/2h~(x)1δε1𝛿2if h~(x)<1δ/2\displaystyle h(x):=\begin{cases}1-\delta\varepsilon,&\text{if $\tilde{h}(x)>1% -\delta\varepsilon$}\\ \tilde{h}(x)&\text{if $1-\delta/2\leq\tilde{h}(x)\leq 1-\delta\varepsilon$}\\ 1-\delta/2,&\text{if $\tilde{h}(x)<1-\delta/2$}\;.\end{cases}italic_h ( italic_x ) := { start_ROW start_CELL 1 - italic_δ italic_ε , end_CELL start_CELL if over~ start_ARG italic_h end_ARG ( italic_x ) > 1 - italic_δ italic_ε end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_h end_ARG ( italic_x ) end_CELL start_CELL if 1 - italic_δ / 2 ≤ over~ start_ARG italic_h end_ARG ( italic_x ) ≤ 1 - italic_δ italic_ε end_CELL end_ROW start_ROW start_CELL 1 - italic_δ / 2 , end_CELL start_CELL if over~ start_ARG italic_h end_ARG ( italic_x ) < 1 - italic_δ / 2 . end_CELL end_ROW

Finally, it remains to verify that the clip** happens only for x𝑥xitalic_x with |x|>1/ε2/11𝑥1superscript𝜀211|x|>1/\varepsilon^{2/11}| italic_x | > 1 / italic_ε start_POSTSUPERSCRIPT 2 / 11 end_POSTSUPERSCRIPT. First, note that h~~\tilde{h}over~ start_ARG italic_h end_ARG is a decreasing function. By plugging x=1/ε2/11𝑥1superscript𝜀211x=-1/\varepsilon^{2/11}italic_x = - 1 / italic_ε start_POSTSUPERSCRIPT 2 / 11 end_POSTSUPERSCRIPT we can see that h(1/ε2/11)=1Θ(ε4/5)1superscript𝜀2111Θsuperscript𝜀45h(-1/\varepsilon^{2/11})=1-\Theta(\varepsilon^{4/5})italic_h ( - 1 / italic_ε start_POSTSUPERSCRIPT 2 / 11 end_POSTSUPERSCRIPT ) = 1 - roman_Θ ( italic_ε start_POSTSUPERSCRIPT 4 / 5 end_POSTSUPERSCRIPT ) (we can see that by using a polynomial approximation for the exsuperscript𝑒𝑥e^{x}italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT function), which is less than the clip** threshold of 1δε=1ε1.51𝛿𝜀1superscript𝜀1.51-\delta\varepsilon=1-\varepsilon^{1.5}1 - italic_δ italic_ε = 1 - italic_ε start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT. Thus, by monotonicity of hhitalic_h, sup{x:h(x)>1δε}<1/ε2/11supremumconditional-set𝑥𝑥1𝛿𝜀1superscript𝜀211\sup\{x\in\mathbb{R}:h(x)>1-\delta\varepsilon\}<-1/\varepsilon^{2/11}roman_sup { italic_x ∈ blackboard_R : italic_h ( italic_x ) > 1 - italic_δ italic_ε } < - 1 / italic_ε start_POSTSUPERSCRIPT 2 / 11 end_POSTSUPERSCRIPT. Similarly, we can check the other boundary. ∎

We now move to the second part of the argument, which aims to find an f2:[ε,ε]:subscript𝑓2𝜀𝜀f_{2}:\mathbb{R}\to[-\varepsilon,\varepsilon]italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : blackboard_R → [ - italic_ε , italic_ε ] such that when f=f1+f2𝑓subscript𝑓1subscript𝑓2f=f_{1}+f_{2}italic_f = italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, the moments of Pfsubscript𝑃𝑓P_{f}italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT get corrected and equal to those of 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 ). Fix C=ε𝐶𝜀C=\sqrt{\varepsilon}italic_C = square-root start_ARG italic_ε end_ARG and ξ=δε0.3=1ε0.8𝜉𝛿superscript𝜀0.31superscript𝜀0.8\xi=\delta\varepsilon^{0.3}=1-\varepsilon^{0.8}italic_ξ = italic_δ italic_ε start_POSTSUPERSCRIPT 0.3 end_POSTSUPERSCRIPT = 1 - italic_ε start_POSTSUPERSCRIPT 0.8 end_POSTSUPERSCRIPT. We will search for an f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT of the particular form below

f2(x)=1ξδp(x)ϕ(xε)𝟙(|x|C),subscript𝑓2𝑥1𝜉𝛿𝑝𝑥italic-ϕ𝑥𝜀1𝑥𝐶\displaystyle f_{2}(x)=\frac{1-\xi}{\delta}\frac{p(x)}{\phi(x-\varepsilon)}% \mathds{1}(|x|\leq C)\;,italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG 1 - italic_ξ end_ARG start_ARG italic_δ end_ARG divide start_ARG italic_p ( italic_x ) end_ARG start_ARG italic_ϕ ( italic_x - italic_ε ) end_ARG blackboard_1 ( | italic_x | ≤ italic_C ) , (19)

for some appropriate polynomial with CCp(x)dx=0superscriptsubscript𝐶𝐶𝑝𝑥differential-d𝑥0\int_{-C}^{C}p(x)\mathrm{d}x=0∫ start_POSTSUBSCRIPT - italic_C end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT italic_p ( italic_x ) roman_d italic_x = 0 and small |p(x)|𝑝𝑥|p(x)|| italic_p ( italic_x ) | for all x[C,C]𝑥𝐶𝐶x\in[-C,C]italic_x ∈ [ - italic_C , italic_C ]. We now show how to find that polynomial and ensure the above properties. Our moment-matching constraint is the following (note that the normalization of the distribution is still 1ξ1𝜉1-\xi1 - italic_ξ, because of the property CCp(x)dx=0superscriptsubscript𝐶𝐶𝑝𝑥differential-d𝑥0\int_{-C}^{C}p(x)\mathrm{d}x=0∫ start_POSTSUBSCRIPT - italic_C end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT italic_p ( italic_x ) roman_d italic_x = 0):

+xtϕ(xε)(1δf1(x)δf2(x))1ξdx=+xtϕ(x)dx.superscriptsubscriptsuperscript𝑥𝑡italic-ϕ𝑥𝜀1𝛿subscript𝑓1𝑥𝛿subscript𝑓2𝑥1𝜉differential-d𝑥superscriptsubscriptsuperscript𝑥𝑡italic-ϕ𝑥differential-d𝑥\displaystyle\int_{-\infty}^{+\infty}x^{t}\frac{\phi(x-\varepsilon)(1-\delta f% _{1}(x)-\delta f_{2}(x))}{1-\xi}\mathrm{d}x=\int_{-\infty}^{+\infty}x^{t}\phi(% x)\mathrm{d}x\;.∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT divide start_ARG italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) - italic_δ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) ) end_ARG start_ARG 1 - italic_ξ end_ARG roman_d italic_x = ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x ) roman_d italic_x .

By letting Pf1(x)=ϕ(xε)(1δf1(x))1ξsubscript𝑃subscript𝑓1𝑥italic-ϕ𝑥𝜀1𝛿subscript𝑓1𝑥1𝜉P_{f_{1}}(x)=\frac{\phi(x-\varepsilon)(1-\delta f_{1}(x))}{1-\xi}italic_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) ) end_ARG start_ARG 1 - italic_ξ end_ARG as in Claim 5.1, the above is equivalent to

C+Cxtp(x)dx=+xtPf1(x)dx+xtϕ(x)dx.superscriptsubscript𝐶𝐶superscript𝑥𝑡𝑝𝑥differential-d𝑥superscriptsubscriptsuperscript𝑥𝑡subscript𝑃subscript𝑓1𝑥differential-d𝑥superscriptsubscriptsuperscript𝑥𝑡italic-ϕ𝑥differential-d𝑥\displaystyle\int_{-C}^{+C}x^{t}p(x)\mathrm{d}x=\int_{-\infty}^{+\infty}x^{t}P% _{f_{1}}(x)\mathrm{d}x-\int_{-\infty}^{+\infty}x^{t}\phi(x)\mathrm{d}x\;.∫ start_POSTSUBSCRIPT - italic_C end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + italic_C end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_p ( italic_x ) roman_d italic_x = ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) roman_d italic_x - ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x ) roman_d italic_x . (20)

The rest of the proof mirrors that in [DKS17]. By Claim 5.8 in [DKS17], there exists a unique polynomial p𝑝pitalic_p satisfying (20), which has the form p(x)=i=0kaiPi(x/C)𝑝𝑥superscriptsubscript𝑖0𝑘subscript𝑎𝑖subscript𝑃𝑖𝑥𝐶p(x)=\sum_{i=0}^{k}a_{i}P_{i}(x/C)italic_p ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x / italic_C ), where Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_i-th Legendre polynomial and ai=2i+12CCCPi(x/C)p(x)dxsubscript𝑎𝑖2𝑖12𝐶superscriptsubscript𝐶𝐶subscript𝑃𝑖𝑥𝐶𝑝𝑥differential-d𝑥a_{i}=\frac{2i+1}{2C}\int_{-C}^{C}P_{i}(x/C)p(x)\mathrm{d}xitalic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 2 italic_i + 1 end_ARG start_ARG 2 italic_C end_ARG ∫ start_POSTSUBSCRIPT - italic_C end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x / italic_C ) italic_p ( italic_x ) roman_d italic_x. We want to show that |ai|=O(iε5)subscript𝑎𝑖𝑂𝑖superscript𝜀5|a_{i}|=O(i\varepsilon^{5})| italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | = italic_O ( italic_i italic_ε start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT ). First we note why this would be enough. This is because, by properties of the Legendre polynomials (see Appendix A for basic properties that we will use), it would imply that |p(x)|=O(i=1k|ai|)=O(k2ε5)𝑝𝑥𝑂superscriptsubscript𝑖1𝑘subscript𝑎𝑖𝑂superscript𝑘2superscript𝜀5|p(x)|=O(\sum_{i=1}^{k}|a_{i}|)=O(k^{2}\varepsilon^{5})| italic_p ( italic_x ) | = italic_O ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ) = italic_O ( italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ε start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT ) for all x[C,C]𝑥𝐶𝐶x\in[-C,C]italic_x ∈ [ - italic_C , italic_C ]. We would then be done, because after combining with (19), we would obtain that for all x[C,C]𝑥𝐶𝐶x\in[-C,C]italic_x ∈ [ - italic_C , italic_C ] it holds

|f2(x)|(1ξ)|p(x)|δϕ(xε)O(ε5k2)δϕ(xε)<ε,subscript𝑓2𝑥1𝜉𝑝𝑥𝛿italic-ϕ𝑥𝜀𝑂superscript𝜀5superscript𝑘2𝛿italic-ϕ𝑥𝜀𝜀\displaystyle|f_{2}(x)|\leq\frac{(1-\xi)|p(x)|}{\delta\phi(x-\varepsilon)}\leq% \frac{O(\varepsilon^{5}k^{2})}{\delta\phi(x-\varepsilon)}<\varepsilon\;,| italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) | ≤ divide start_ARG ( 1 - italic_ξ ) | italic_p ( italic_x ) | end_ARG start_ARG italic_δ italic_ϕ ( italic_x - italic_ε ) end_ARG ≤ divide start_ARG italic_O ( italic_ε start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_δ italic_ϕ ( italic_x - italic_ε ) end_ARG < italic_ε ,

where we used δ=ε𝛿𝜀\delta=\sqrt{\varepsilon}italic_δ = square-root start_ARG italic_ε end_ARG, kc/ε0.15𝑘𝑐superscript𝜀0.15k\leq c/\varepsilon^{0.15}italic_k ≤ italic_c / italic_ε start_POSTSUPERSCRIPT 0.15 end_POSTSUPERSCRIPT, and that ϕ(xε)1/3italic-ϕ𝑥𝜀13\phi(x-\varepsilon)\geq 1/3italic_ϕ ( italic_x - italic_ε ) ≥ 1 / 3 for all x[1,1]𝑥11x\in[-1,1]italic_x ∈ [ - 1 , 1 ]. We conclude by showing that |ai|=O(iε5)subscript𝑎𝑖𝑂𝑖superscript𝜀5|a_{i}|=O(i\varepsilon^{5})| italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | = italic_O ( italic_i italic_ε start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT ). First, by orthogonality of the Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s and (20),

|CCPi(x/C)p(x)dx|superscriptsubscript𝐶𝐶subscript𝑃𝑖𝑥𝐶𝑝𝑥differential-d𝑥\displaystyle\left|\int_{-C}^{C}P_{i}(x/C)p(x)\mathrm{d}x\right|| ∫ start_POSTSUBSCRIPT - italic_C end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x / italic_C ) italic_p ( italic_x ) roman_d italic_x | =|+Pi(x/C)(ϕ(x)Pf1(x))dx|=||x|>1/ε2/11Pi(x/C)(ϕ(x)Pf1(x))dx|absentsuperscriptsubscriptsubscript𝑃𝑖𝑥𝐶italic-ϕ𝑥subscript𝑃subscript𝑓1𝑥differential-d𝑥subscript𝑥1superscript𝜀211subscript𝑃𝑖𝑥𝐶italic-ϕ𝑥subscript𝑃subscript𝑓1𝑥differential-d𝑥\displaystyle=\left|\int_{-\infty}^{+\infty}P_{i}(x/C)(\phi(x)-P_{f_{1}}(x))% \mathrm{d}x\right|=\left|\int_{|x|>1/\varepsilon^{2/11}}P_{i}(x/C)(\phi(x)-P_{% f_{1}}(x))\mathrm{d}x\right|= | ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x / italic_C ) ( italic_ϕ ( italic_x ) - italic_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) ) roman_d italic_x | = | ∫ start_POSTSUBSCRIPT | italic_x | > 1 / italic_ε start_POSTSUPERSCRIPT 2 / 11 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x / italic_C ) ( italic_ϕ ( italic_x ) - italic_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) ) roman_d italic_x |
||x|>1/ε2/11Pi(x/C)ϕ(x)dx|+||x|>1/ε2/11Pi(x/C)Pf1(x)dx|,absentsubscript𝑥1superscript𝜀211subscript𝑃𝑖𝑥𝐶italic-ϕ𝑥differential-d𝑥subscript𝑥1superscript𝜀211subscript𝑃𝑖𝑥𝐶subscript𝑃subscript𝑓1𝑥differential-d𝑥\displaystyle\leq\left|\int_{|x|>1/\varepsilon^{2/11}}P_{i}(x/C)\phi(x)\mathrm% {d}x\right|+\left|\int_{|x|>1/\varepsilon^{2/11}}P_{i}(x/C)P_{f_{1}}(x)\mathrm% {d}x\right|\;,≤ | ∫ start_POSTSUBSCRIPT | italic_x | > 1 / italic_ε start_POSTSUPERSCRIPT 2 / 11 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x / italic_C ) italic_ϕ ( italic_x ) roman_d italic_x | + | ∫ start_POSTSUBSCRIPT | italic_x | > 1 / italic_ε start_POSTSUPERSCRIPT 2 / 11 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x / italic_C ) italic_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) roman_d italic_x | ,

where the second step used that Pf1(x)=ϕ(x)subscript𝑃subscript𝑓1𝑥italic-ϕ𝑥P_{f_{1}}(x)=\phi(x)italic_P start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) = italic_ϕ ( italic_x ) for all x𝑥xitalic_x with |x|1/ε2/11𝑥1superscript𝜀211|x|\leq 1/\varepsilon^{2/11}| italic_x | ≤ 1 / italic_ε start_POSTSUPERSCRIPT 2 / 11 end_POSTSUPERSCRIPT by Claim 5.1. We will show the bound for the first term (the other one is similar).

Claim 5.2.

Fix C=ε𝐶𝜀C=\sqrt{\varepsilon}italic_C = square-root start_ARG italic_ε end_ARG, and let Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denote the i𝑖iitalic_i-Legendre polynomial and p𝑝pitalic_p be the solution to (20). Then, ||x|>1/ε2/11Pi(x/C)ϕ(x)dx|=O(ε5)subscript𝑥1superscript𝜀211subscript𝑃𝑖𝑥𝐶italic-ϕ𝑥differential-d𝑥𝑂superscript𝜀5\left|\int_{|x|>1/\varepsilon^{2/11}}P_{i}(x/C)\phi(x)\mathrm{d}x\right|=O(% \varepsilon^{5})| ∫ start_POSTSUBSCRIPT | italic_x | > 1 / italic_ε start_POSTSUPERSCRIPT 2 / 11 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x / italic_C ) italic_ϕ ( italic_x ) roman_d italic_x | = italic_O ( italic_ε start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT ).

Proof.

We will use the known property that the j𝑗jitalic_j-th Legendre polynomial can be written as Pj(x)=12ij=0i/2(ij)(2i2ji)xi2jsubscript𝑃𝑗𝑥1superscript2𝑖superscriptsubscript𝑗0𝑖2binomial𝑖𝑗binomial2𝑖2𝑗𝑖superscript𝑥𝑖2𝑗P_{j}(x)=\frac{1}{2^{i}}\sum_{j=0}^{\lfloor i/2\rfloor}\binom{i}{j}\binom{2i-2% j}{i}x^{i-2j}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ italic_i / 2 ⌋ end_POSTSUPERSCRIPT ( FRACOP start_ARG italic_i end_ARG start_ARG italic_j end_ARG ) ( FRACOP start_ARG 2 italic_i - 2 italic_j end_ARG start_ARG italic_i end_ARG ) italic_x start_POSTSUPERSCRIPT italic_i - 2 italic_j end_POSTSUPERSCRIPT. We will also use that there is negligible mass at the tails |x|>1/ε2/11𝑥1superscript𝜀211|x|>1/\varepsilon^{2/11}| italic_x | > 1 / italic_ε start_POSTSUPERSCRIPT 2 / 11 end_POSTSUPERSCRIPT. We have that

|CCPi(x/C)p(x)dx|superscriptsubscript𝐶𝐶subscript𝑃𝑖𝑥𝐶𝑝𝑥differential-d𝑥\displaystyle\left|\int_{-C}^{C}P_{i}(x/C)p(x)\mathrm{d}x\right|| ∫ start_POSTSUBSCRIPT - italic_C end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x / italic_C ) italic_p ( italic_x ) roman_d italic_x | =O((k/C)3k)|x|>1/ε2/11|x|kϕ(x)dxabsent𝑂superscript𝑘𝐶3𝑘subscript𝑥1superscript𝜀211superscript𝑥𝑘italic-ϕ𝑥differential-d𝑥\displaystyle=O((k/C)^{3k})\int_{|x|>1/\varepsilon^{2/11}}|x|^{k}\phi(x)% \mathrm{d}x= italic_O ( ( italic_k / italic_C ) start_POSTSUPERSCRIPT 3 italic_k end_POSTSUPERSCRIPT ) ∫ start_POSTSUBSCRIPT | italic_x | > 1 / italic_ε start_POSTSUPERSCRIPT 2 / 11 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_x | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_ϕ ( italic_x ) roman_d italic_x
O((k/C)3k)|x|>1/ε2/11|x|kex2/2dxabsent𝑂superscript𝑘𝐶3𝑘subscript𝑥1superscript𝜀211superscript𝑥𝑘superscript𝑒superscript𝑥22differential-d𝑥\displaystyle\leq O((k/C)^{3k})\int_{|x|>1/\varepsilon^{2/11}}|x|^{k}e^{-x^{2}% /2}\mathrm{d}x≤ italic_O ( ( italic_k / italic_C ) start_POSTSUPERSCRIPT 3 italic_k end_POSTSUPERSCRIPT ) ∫ start_POSTSUBSCRIPT | italic_x | > 1 / italic_ε start_POSTSUPERSCRIPT 2 / 11 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_x | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 end_POSTSUPERSCRIPT roman_d italic_x
O((k/C)3k)ε10k=O((k/ε)3k)ε10k=O(ε5),absent𝑂superscript𝑘𝐶3𝑘superscript𝜀10𝑘𝑂superscript𝑘𝜀3𝑘superscript𝜀10𝑘𝑂superscript𝜀5\displaystyle\leq O((k/C)^{3k})\varepsilon^{10k}=O((k/\sqrt{\varepsilon})^{3k}% )\varepsilon^{10k}=O(\varepsilon^{5})\;,≤ italic_O ( ( italic_k / italic_C ) start_POSTSUPERSCRIPT 3 italic_k end_POSTSUPERSCRIPT ) italic_ε start_POSTSUPERSCRIPT 10 italic_k end_POSTSUPERSCRIPT = italic_O ( ( italic_k / square-root start_ARG italic_ε end_ARG ) start_POSTSUPERSCRIPT 3 italic_k end_POSTSUPERSCRIPT ) italic_ε start_POSTSUPERSCRIPT 10 italic_k end_POSTSUPERSCRIPT = italic_O ( italic_ε start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT ) ,

where the first step bounds the binomial coefficients by kksuperscript𝑘𝑘k^{k}italic_k start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and in the last line uses that for any |x|>100klog(1/ε)=ε3/20log(1/ε)𝑥100𝑘1𝜀superscript𝜀3201𝜀|x|>100k\log(1/\varepsilon)=\varepsilon^{-3/20}\log(1/\varepsilon)| italic_x | > 100 italic_k roman_log ( 1 / italic_ε ) = italic_ε start_POSTSUPERSCRIPT - 3 / 20 end_POSTSUPERSCRIPT roman_log ( 1 / italic_ε ) (recall that k=Θ(ε3/20)𝑘Θsuperscript𝜀320k=\Theta(\varepsilon^{3/20})italic_k = roman_Θ ( italic_ε start_POSTSUPERSCRIPT 3 / 20 end_POSTSUPERSCRIPT )) it holds |x|kex2/2<ε10k/x2superscript𝑥𝑘superscript𝑒superscript𝑥22superscript𝜀10𝑘superscript𝑥2|x|^{k}e^{-x^{2}/2}<\varepsilon^{10k}/x^{2}| italic_x | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 end_POSTSUPERSCRIPT < italic_ε start_POSTSUPERSCRIPT 10 italic_k end_POSTSUPERSCRIPT / italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, in order to bound |x|>ε2/11|x|kex2/2dx|x|>1|x|kex2/2dxε10k|x|>1x2dxε10ksubscript𝑥superscript𝜀211superscript𝑥𝑘superscript𝑒superscript𝑥22differential-d𝑥subscript𝑥1superscript𝑥𝑘superscript𝑒superscript𝑥22differential-d𝑥superscript𝜀10𝑘subscript𝑥1superscript𝑥2differential-d𝑥superscript𝜀10𝑘\int_{|x|>\varepsilon^{-2/11}}|x|^{k}e^{-x^{2}/2}\mathrm{d}x\leq\int_{|x|>1}|x% |^{k}e^{-x^{2}/2}\mathrm{d}x\leq\varepsilon^{10k}\int_{|x|>1}x^{-2}\mathrm{d}x% \leq\varepsilon^{10k}∫ start_POSTSUBSCRIPT | italic_x | > italic_ε start_POSTSUPERSCRIPT - 2 / 11 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_x | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 end_POSTSUPERSCRIPT roman_d italic_x ≤ ∫ start_POSTSUBSCRIPT | italic_x | > 1 end_POSTSUBSCRIPT | italic_x | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 end_POSTSUPERSCRIPT roman_d italic_x ≤ italic_ε start_POSTSUPERSCRIPT 10 italic_k end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT | italic_x | > 1 end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_d italic_x ≤ italic_ε start_POSTSUPERSCRIPT 10 italic_k end_POSTSUPERSCRIPT.

This completes the proof of Items 1 and 2. We next use a randomized rounding technique similar to [DKZ20], in order to convert this continuous f𝑓fitalic_f to a piecewise constant f~:{0,1}:~𝑓01\tilde{f}:\mathbb{R}\to\{0,1\}over~ start_ARG italic_f end_ARG : blackboard_R → { 0 , 1 }, i.e., a hard set. We show the following in Section 5.1:

Claim 5.3.

For any η>0𝜂0\eta>0italic_η > 0 there exists a ((klog(1/η))poly(k)/η2)superscript𝑘1𝜂normal-poly𝑘superscript𝜂2((k\log(1/\eta))^{\mathrm{poly}(k)}/\eta^{2})( ( italic_k roman_log ( 1 / italic_η ) ) start_POSTSUPERSCRIPT roman_poly ( italic_k ) end_POSTSUPERSCRIPT / italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )-piecewise constant function f~:{0,1}normal-:normal-~𝑓normal-→01\tilde{f}:\mathbb{R}\to\{0,1\}over~ start_ARG italic_f end_ARG : blackboard_R → { 0 , 1 } such that Prx𝒩(ε,1)[f~(x)]2δsubscriptnormal-Prsimilar-to𝑥𝒩𝜀1normal-~𝑓𝑥2𝛿\Pr_{x\sim\mathcal{N}(\varepsilon,1)}[\tilde{f}(x)]\leq 2\deltaroman_Pr start_POSTSUBSCRIPT italic_x ∼ caligraphic_N ( italic_ε , 1 ) end_POSTSUBSCRIPT [ over~ start_ARG italic_f end_ARG ( italic_x ) ] ≤ 2 italic_δ and for all t=0,,k𝑡0normal-…𝑘t=0,\ldots,kitalic_t = 0 , … , italic_k it holds |𝐄x𝒩(ε,1)[xt(1δf~(x))]Z1𝐄x𝒩(0,1)[xt]|ηsubscript𝐄similar-to𝑥𝒩𝜀1superscript𝑥𝑡1𝛿normal-~𝑓𝑥superscript𝑍1subscript𝐄similar-to𝑥𝒩01superscript𝑥𝑡𝜂|\operatorname*{\mathbf{E}}_{x\sim\mathcal{N}(\varepsilon,1)}[x^{t}(1-\delta% \tilde{f}(x))]Z^{-1}-\operatorname*{\mathbf{E}}_{x\sim\mathcal{N}(0,1)}[x^{t}]% |\leq\eta| bold_E start_POSTSUBSCRIPT italic_x ∼ caligraphic_N ( italic_ε , 1 ) end_POSTSUBSCRIPT [ italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ) ] italic_Z start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - bold_E start_POSTSUBSCRIPT italic_x ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] | ≤ italic_η, where Z=𝐄x𝒩(ε,1)[1δf~(x)]𝑍subscript𝐄similar-to𝑥𝒩𝜀11𝛿normal-~𝑓𝑥Z=\operatorname*{\mathbf{E}}_{x\sim\mathcal{N}(\varepsilon,1)}[1-\delta\tilde{% f}(x)]italic_Z = bold_E start_POSTSUBSCRIPT italic_x ∼ caligraphic_N ( italic_ε , 1 ) end_POSTSUBSCRIPT [ 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ].

The idea for Claim 5.3 is to split \mathbb{R}blackboard_R into [is,(i+1)s]𝑖𝑠𝑖1𝑠[is,(i+1)s][ italic_i italic_s , ( italic_i + 1 ) italic_s ], for i𝑖i\in\mathbb{Z}italic_i ∈ blackboard_Z and a sufficiently small size s𝑠sitalic_s, and to let f~(x)~𝑓𝑥\tilde{f}(x)over~ start_ARG italic_f end_ARG ( italic_x ) be constant in the interval x[is,(i+1)s)𝑥𝑖𝑠𝑖1𝑠x\in[is,(i+1)s)italic_x ∈ [ italic_i italic_s , ( italic_i + 1 ) italic_s ), taking the following values:

f~(x)={1,with probability pi:=is(i+1)sϕ(xε)f(x)dx/is(i+1)sϕ(xε)dx0,with probability 1pi~𝑓𝑥cases1with probability pi:=is(i+1)sϕ(xε)f(x)dx/is(i+1)sϕ(xε)dx0with probability 1pi\displaystyle\tilde{f}(x)=\begin{cases}1,&\text{with probability $p_{i}:=\int_% {is}^{(i+1)s}\phi(x-\varepsilon)f(x)\mathrm{d}x/\int_{is}^{(i+1)s}\phi(x-% \varepsilon)\mathrm{d}x$}\\ 0,&\text{with probability $1-p_{i}$}\end{cases}over~ start_ARG italic_f end_ARG ( italic_x ) = { start_ROW start_CELL 1 , end_CELL start_CELL with probability italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) italic_f ( italic_x ) roman_d italic_x / ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) roman_d italic_x end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL with probability 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW (21)

We want to show that 𝐄x𝒩(ε,1)[xt(1δf~(x))]𝐄x𝒩(ε,1)[xt(1δf(x))]subscript𝐄similar-to𝑥𝒩𝜀1superscript𝑥𝑡1𝛿~𝑓𝑥subscript𝐄similar-to𝑥𝒩𝜀1superscript𝑥𝑡1𝛿𝑓𝑥\operatorname*{\mathbf{E}}_{x\sim\mathcal{N}(\varepsilon,1)}[x^{t}(1-\delta% \tilde{f}(x))]\approx\operatorname*{\mathbf{E}}_{x\sim\mathcal{N}(\varepsilon,% 1)}[x^{t}(1-\delta f(x))]bold_E start_POSTSUBSCRIPT italic_x ∼ caligraphic_N ( italic_ε , 1 ) end_POSTSUBSCRIPT [ italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ) ] ≈ bold_E start_POSTSUBSCRIPT italic_x ∼ caligraphic_N ( italic_ε , 1 ) end_POSTSUBSCRIPT [ italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( 1 - italic_δ italic_f ( italic_x ) ) ] (which we have already shown that is equal to 𝐄x𝒩(0,1)[xt]subscript𝐄similar-to𝑥𝒩01superscript𝑥𝑡\operatorname*{\mathbf{E}}_{x\sim\mathcal{N}(0,1)}[x^{t}]bold_E start_POSTSUBSCRIPT italic_x ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]). Let Ii:=is(i+1)sxtϕ(xε)δ(f(x)f~(x))dxassignsubscript𝐼𝑖superscriptsubscript𝑖𝑠𝑖1𝑠superscript𝑥𝑡italic-ϕ𝑥𝜀𝛿𝑓𝑥~𝑓𝑥differential-d𝑥I_{i}:=\int_{is}^{(i+1)s}x^{t}\phi(x-\varepsilon)\delta(f(x)-\tilde{f}(x))% \mathrm{d}xitalic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) italic_δ ( italic_f ( italic_x ) - over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x be the contribution due to the i𝑖iitalic_i-th interval. Then, using the Taylor approximation xt=(is)t+(xis)tξt1superscript𝑥𝑡superscript𝑖𝑠𝑡𝑥𝑖𝑠𝑡superscript𝜉𝑡1x^{t}=(is)^{t}+(x-is)t\xi^{t-1}italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ( italic_i italic_s ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + ( italic_x - italic_i italic_s ) italic_t italic_ξ start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT for some ξ𝜉\xiitalic_ξ between is𝑖𝑠isitalic_i italic_s and x𝑥xitalic_x, the expected (with respect to f~~𝑓\tilde{f}over~ start_ARG italic_f end_ARG’s randomness) value of iIisubscript𝑖subscript𝐼𝑖\sum_{i}I_{i}∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is

𝐄[iIi]=i(is)tis(i+1)sϕ(xε)δ(f(x)pi)dx+tξt1is(i+1)s(xis)ϕ(xε)δ(f(x)pi)dx.𝐄subscript𝑖subscript𝐼𝑖subscript𝑖superscript𝑖𝑠𝑡superscriptsubscript𝑖𝑠𝑖1𝑠italic-ϕ𝑥𝜀𝛿𝑓𝑥subscript𝑝𝑖differential-d𝑥𝑡superscript𝜉𝑡1superscriptsubscript𝑖𝑠𝑖1𝑠𝑥𝑖𝑠italic-ϕ𝑥𝜀𝛿𝑓𝑥subscript𝑝𝑖differential-d𝑥\displaystyle\operatorname*{\mathbf{E}}\bigg{[}\sum_{i}I_{i}\bigg{]}=\sum_{i}(% is)^{t}\int_{is}^{(i+1)s}\phi(x-\varepsilon)\delta(f(x)-p_{i})\mathrm{d}x+t\xi% ^{t-1}\int_{is}^{(i+1)s}(x-is)\phi(x-\varepsilon)\delta(f(x)-p_{i})\mathrm{d}x.bold_E [ ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_i italic_s ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) italic_δ ( italic_f ( italic_x ) - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_d italic_x + italic_t italic_ξ start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT ( italic_x - italic_i italic_s ) italic_ϕ ( italic_x - italic_ε ) italic_δ ( italic_f ( italic_x ) - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_d italic_x .

The first term above is zero by definition of the pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s. We can show that the second term is at most η𝜂\etaitalic_η by choosing appropriately small interval size s𝑠sitalic_s.

The proof of Lemma 3.4 is completed by reducing the number of pieces to k𝑘kitalic_k using Claim 5.3 and Proposition 4.3 as we did in Lemma 3.3.

5.1 Proof of Claim 5.3

We fix the following parameters throughout the proof (where C𝐶Citalic_C denotes a sufficiently large absolute constant):

  • imax=(Clog(1/η))k/2/ssubscript𝑖superscript𝐶1𝜂𝑘2𝑠i_{\max}=(C\log(1/\eta))^{k/2}/sitalic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = ( italic_C roman_log ( 1 / italic_η ) ) start_POSTSUPERSCRIPT italic_k / 2 end_POSTSUPERSCRIPT / italic_s

  • s=η2/(k3kC2k2logk2(1/η))𝑠superscript𝜂2superscript𝑘3𝑘superscript𝐶2superscript𝑘2superscriptsuperscript𝑘21𝜂s=\eta^{2}/(k^{3k}C^{2k^{2}}\log^{k^{2}}(1/\eta))italic_s = italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( italic_k start_POSTSUPERSCRIPT 3 italic_k end_POSTSUPERSCRIPT italic_C start_POSTSUPERSCRIPT 2 italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( 1 / italic_η ) )

We partition the real line in pieces [is,(i+1)s)𝑖𝑠𝑖1𝑠[is,(i+1)s)[ italic_i italic_s , ( italic_i + 1 ) italic_s ) for i𝑖i\in\mathbb{Z}italic_i ∈ blackboard_Z. We define f~~𝑓\tilde{f}over~ start_ARG italic_f end_ARG to be the following random piecewise-constant function: For each i{imax,,imax}𝑖subscript𝑖subscript𝑖i\in\{-i_{\max},\ldots,i_{\max}\}italic_i ∈ { - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT } we let f~(x)~𝑓𝑥\tilde{f}(x)over~ start_ARG italic_f end_ARG ( italic_x ) be constant in the interval x[is,(i+1)s)𝑥𝑖𝑠𝑖1𝑠x\in[is,(i+1)s)italic_x ∈ [ italic_i italic_s , ( italic_i + 1 ) italic_s ), taking the following value:

f~(x)={1,with probability pi:=is(i+1)sϕ(xε)f(x)dx/is(i+1)sϕ(xε)dx0,with probability 1pi~𝑓𝑥cases1with probability pi:=is(i+1)sϕ(xε)f(x)dx/is(i+1)sϕ(xε)dx0with probability 1pi\displaystyle\tilde{f}(x)=\begin{cases}1,&\text{with probability $p_{i}:=\int_% {is}^{(i+1)s}\phi(x-\varepsilon)f(x)\mathrm{d}x/\int_{is}^{(i+1)s}\phi(x-% \varepsilon)\mathrm{d}x$}\\ 0,&\text{with probability $1-p_{i}$}\end{cases}over~ start_ARG italic_f end_ARG ( italic_x ) = { start_ROW start_CELL 1 , end_CELL start_CELL with probability italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) italic_f ( italic_x ) roman_d italic_x / ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) roman_d italic_x end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL with probability 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW (22)

and we define f~(x)=0~𝑓𝑥0\tilde{f}(x)=0over~ start_ARG italic_f end_ARG ( italic_x ) = 0 with probability 1 in the entire (,imaxs)[imaxs,+)subscript𝑖𝑠subscript𝑖𝑠(-\infty,-i_{\max}s)\cup[i_{\max}s,+\infty)( - ∞ , - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s ) ∪ [ italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s , + ∞ ).

Our goal is to show that for all t=0,,k𝑡0𝑘t=0,\ldots,kitalic_t = 0 , … , italic_k, we have |xtPf(x)dxxtPf~(x)dx|ηmuch-less-thansubscriptsuperscript𝑥𝑡subscript𝑃𝑓𝑥differential-d𝑥subscriptsuperscript𝑥𝑡subscript𝑃~𝑓𝑥differential-d𝑥𝜂|\int_{\mathbb{R}}x^{t}P_{f}(x)\mathrm{d}x-\int_{\mathbb{R}}x^{t}P_{\tilde{f}}% (x)\mathrm{d}x|\ll\eta| ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x ) roman_d italic_x - ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT over~ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( italic_x ) roman_d italic_x | ≪ italic_η, where we are using the notation from (18). We will do this in two steps: we will first show that xtϕ(xε)(1δf~(x))dxsubscriptsuperscript𝑥𝑡italic-ϕ𝑥𝜀1𝛿~𝑓𝑥differential-d𝑥\int_{\mathbb{R}}x^{t}\phi(x-\varepsilon)(1-\delta\tilde{f}(x))\mathrm{d}x∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x is approximately (up to an additive term of η𝜂\etaitalic_η) equal to xtϕ(xε)(1δf(x))dxsubscriptsuperscript𝑥𝑡italic-ϕ𝑥𝜀1𝛿𝑓𝑥differential-d𝑥\int_{\mathbb{R}}x^{t}\phi(x-\varepsilon)(1-\delta f(x))\mathrm{d}x∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ italic_f ( italic_x ) ) roman_d italic_x and then we will do the same for the normalizing factor ϕ(xε)(1δf~(x))dxsubscriptitalic-ϕ𝑥𝜀1𝛿~𝑓𝑥differential-d𝑥\int_{\mathbb{R}}\phi(x-\varepsilon)(1-\delta\tilde{f}(x))\mathrm{d}x∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x.

We start with the first part, which we will do by probabilistic argument. First,

superscriptsubscript\displaystyle\int\limits_{-\infty}^{\infty}∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT xtϕ(xε)(1δf~(x))dxxtϕ(xε)(1δf(x))dxsuperscript𝑥𝑡italic-ϕ𝑥𝜀1𝛿~𝑓𝑥d𝑥superscriptsubscriptsuperscript𝑥𝑡italic-ϕ𝑥𝜀1𝛿𝑓𝑥differential-d𝑥\displaystyle x^{t}\phi(x-\varepsilon)(1-\delta\tilde{f}(x))\mathrm{d}x-\int% \limits_{-\infty}^{\infty}x^{t}\phi(x-\varepsilon)(1-\delta f(x))\mathrm{d}xitalic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x - ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ italic_f ( italic_x ) ) roman_d italic_x
=(imax+1)sxtϕ(xε)dx+imaxsxtϕ(xε)dx+i=imaximaxis(i+1)sxtϕ(xε)δ(f(x)f~(x))dx.absentsuperscriptsubscriptsubscript𝑖1𝑠superscript𝑥𝑡italic-ϕ𝑥𝜀differential-d𝑥superscriptsubscriptsubscript𝑖𝑠superscript𝑥𝑡italic-ϕ𝑥𝜀differential-d𝑥superscriptsubscript𝑖subscript𝑖subscript𝑖superscriptsubscript𝑖𝑠𝑖1𝑠superscript𝑥𝑡italic-ϕ𝑥𝜀𝛿𝑓𝑥~𝑓𝑥differential-d𝑥\displaystyle=\int\limits_{(i_{\max}+1)s}^{\infty}x^{t}\phi(x-\varepsilon)% \mathrm{d}x+\int\limits_{-\infty}^{-i_{\max}s}x^{t}\phi(x-\varepsilon)\mathrm{% d}x+\sum_{i=-i_{\max}}^{i_{\max}}\int\limits_{is}^{(i+1)s}x^{t}\phi(x-% \varepsilon)\delta(f(x)-\tilde{f}(x))\mathrm{d}x\;.= ∫ start_POSTSUBSCRIPT ( italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT + 1 ) italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) roman_d italic_x + ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) roman_d italic_x + ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) italic_δ ( italic_f ( italic_x ) - over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x . (23)

We note that the first two terms are negligible, i.e., less than a small multiple of η𝜂\etaitalic_η. This is because of the fact Prz𝒩(0,1[|z|t>β]eβ2/2\Pr_{z\sim\mathcal{N}(0,1}[|z|^{t}>\beta]\leq e^{-\beta^{2}/2}roman_Pr start_POSTSUBSCRIPT italic_z ∼ caligraphic_N ( 0 , 1 end_POSTSUBSCRIPT [ | italic_z | start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT > italic_β ] ≤ italic_e start_POSTSUPERSCRIPT - italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 end_POSTSUPERSCRIPT for all β1𝛽1\beta\geq 1italic_β ≥ 1, applied with β=imaxs=(Clog(1/η))k/2𝛽subscript𝑖𝑠superscript𝐶1𝜂𝑘2\beta=i_{\max}s=(C\log(1/\eta))^{k/2}italic_β = italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s = ( italic_C roman_log ( 1 / italic_η ) ) start_POSTSUPERSCRIPT italic_k / 2 end_POSTSUPERSCRIPT.

For the remaining sum, let us use the notation Ii:=is(i+1)sxtϕ(xε)δ(f(x)f~(x))dxassignsubscript𝐼𝑖superscriptsubscript𝑖𝑠𝑖1𝑠superscript𝑥𝑡italic-ϕ𝑥𝜀𝛿𝑓𝑥~𝑓𝑥differential-d𝑥I_{i}:=\int_{is}^{(i+1)s}x^{t}\phi(x-\varepsilon)\delta(f(x)-\tilde{f}(x))% \mathrm{d}xitalic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) italic_δ ( italic_f ( italic_x ) - over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x. These are random integrals, where the randomness comes from how f~(x)~𝑓𝑥\tilde{f}(x)over~ start_ARG italic_f end_ARG ( italic_x ) is defined in [is,(i+1)s)𝑖𝑠𝑖1𝑠[is,(i+1)s)[ italic_i italic_s , ( italic_i + 1 ) italic_s ). The goal is to show that with non-zero probability |i=imaximaxIi|ηmuch-less-thansuperscriptsubscript𝑖subscript𝑖subscript𝑖subscript𝐼𝑖𝜂|\sum_{i=-i_{\max}}^{i_{\max}}I_{i}|\ll\eta| ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≪ italic_η. Then, by probabilistic argument we would know that such an f~~𝑓\tilde{f}over~ start_ARG italic_f end_ARG exists.

We start with the expectation of these Iisubscript𝐼𝑖I_{i}italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s, where we will employ Taylor’s theorem for xtsuperscript𝑥𝑡x^{t}italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, i.e., xt=(is)t+(xis)tξt1superscript𝑥𝑡superscript𝑖𝑠𝑡𝑥𝑖𝑠𝑡superscript𝜉𝑡1x^{t}=(is)^{t}+(x-is)t\xi^{t-1}italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ( italic_i italic_s ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + ( italic_x - italic_i italic_s ) italic_t italic_ξ start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT for some ξ=ξ(x)𝜉𝜉𝑥\xi=\xi(x)italic_ξ = italic_ξ ( italic_x ) between is𝑖𝑠isitalic_i italic_s and x𝑥xitalic_x. We have that:

𝐄[i=imaximaxIi]=i=imaximaxis(i+1)sxtϕ(xε)δ(f(x)pi)dx𝐄superscriptsubscript𝑖subscript𝑖subscript𝑖subscript𝐼𝑖superscriptsubscript𝑖subscript𝑖subscript𝑖superscriptsubscript𝑖𝑠𝑖1𝑠superscript𝑥𝑡italic-ϕ𝑥𝜀𝛿𝑓𝑥subscript𝑝𝑖differential-d𝑥\displaystyle\operatorname*{\mathbf{E}}\left[\sum_{i=-i_{\max}}^{i_{\max}}I_{i% }\right]=\sum_{i=-i_{\max}}^{i_{\max}}\int\limits_{is}^{(i+1)s}x^{t}\phi(x-% \varepsilon)\delta(f(x)-p_{i})\mathrm{d}xbold_E [ ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) italic_δ ( italic_f ( italic_x ) - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_d italic_x
=i=imaximax(is)tis(i+1)sϕ(xε)δ(f(x)pi)dx+tξt1is(i+1)s(xis)ϕ(xε)δ(f(x)pi)dx.absentsuperscriptsubscript𝑖subscript𝑖subscript𝑖superscript𝑖𝑠𝑡superscriptsubscript𝑖𝑠𝑖1𝑠italic-ϕ𝑥𝜀𝛿𝑓𝑥subscript𝑝𝑖differential-d𝑥𝑡superscript𝜉𝑡1superscriptsubscript𝑖𝑠𝑖1𝑠𝑥𝑖𝑠italic-ϕ𝑥𝜀𝛿𝑓𝑥subscript𝑝𝑖differential-d𝑥\displaystyle=\sum_{i=-i_{\max}}^{i_{\max}}(is)^{t}\int\limits_{is}^{(i+1)s}% \phi(x-\varepsilon)\delta(f(x)-p_{i})\mathrm{d}x+t\xi^{t-1}\int\limits_{is}^{(% i+1)s}(x-is)\phi(x-\varepsilon)\delta(f(x)-p_{i})\mathrm{d}x\;.= ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_i italic_s ) start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) italic_δ ( italic_f ( italic_x ) - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_d italic_x + italic_t italic_ξ start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT ( italic_x - italic_i italic_s ) italic_ϕ ( italic_x - italic_ε ) italic_δ ( italic_f ( italic_x ) - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_d italic_x .

The first term above is zero because of the definition of pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from (22). For the second term, we have the following bounds:

|i=imaximaxtξt1is(i+1)s(xis)ϕ(xε)\displaystyle\bigg{|}\sum_{i=-i_{\max}}^{i_{\max}}t\xi^{t-1}\int\limits_{is}^{% (i+1)s}(x-is)\phi(x-\varepsilon)| ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_t italic_ξ start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT ( italic_x - italic_i italic_s ) italic_ϕ ( italic_x - italic_ε ) (f(x)pi)dx|\displaystyle(f(x)-p_{i})\mathrm{d}x\bigg{|}( italic_f ( italic_x ) - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_d italic_x |
t(imaxs)t1i=imaximaxis(i+1)s|xis|ϕ(xε)dxabsent𝑡superscriptsubscript𝑖𝑠𝑡1superscriptsubscript𝑖subscript𝑖subscript𝑖superscriptsubscript𝑖𝑠𝑖1𝑠𝑥𝑖𝑠italic-ϕ𝑥𝜀differential-d𝑥\displaystyle\leq t(i_{\max}s)^{t-1}\sum_{i=-i_{\max}}^{i_{\max}}\int\limits_{% is}^{(i+1)s}|x-is|\phi(x-\varepsilon)\mathrm{d}x≤ italic_t ( italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT | italic_x - italic_i italic_s | italic_ϕ ( italic_x - italic_ε ) roman_d italic_x
t(imaxs)t1i=imaximaxsis(i+1)sϕ(xε)dxabsent𝑡superscriptsubscript𝑖𝑠𝑡1superscriptsubscript𝑖subscript𝑖subscript𝑖𝑠superscriptsubscript𝑖𝑠𝑖1𝑠italic-ϕ𝑥𝜀differential-d𝑥\displaystyle\leq t(i_{\max}s)^{t-1}\sum_{i=-i_{\max}}^{i_{\max}}s\int\limits_% {is}^{(i+1)s}\phi(x-\varepsilon)\mathrm{d}x≤ italic_t ( italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_s ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) roman_d italic_x
st(imaxs)t1st(Ck/2logk/2(1/η))t1η,absent𝑠𝑡superscriptsubscript𝑖𝑠𝑡1𝑠𝑡superscriptsuperscript𝐶𝑘2superscript𝑘21𝜂𝑡1much-less-than𝜂\displaystyle\leq st(i_{\max}s)^{t-1}\leq st(C^{k/2}\log^{k/2}(1/\eta))^{t-1}% \ll\eta\;,≤ italic_s italic_t ( italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ≤ italic_s italic_t ( italic_C start_POSTSUPERSCRIPT italic_k / 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT italic_k / 2 end_POSTSUPERSCRIPT ( 1 / italic_η ) ) start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ≪ italic_η ,

where the first line uses that δ1𝛿1\delta\leq 1italic_δ ≤ 1, ξimaxs𝜉subscript𝑖𝑠\xi\leq i_{\max}sitalic_ξ ≤ italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s, f(x)[0,1]𝑓𝑥01f(x)\in[0,1]italic_f ( italic_x ) ∈ [ 0 , 1 ], and pi[0,1]subscript𝑝𝑖01p_{i}\in[0,1]italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ [ 0 , 1 ], the second line uses that the integral is over an interval of length s𝑠sitalic_s, the third line first uses that is(i+1)sϕ(xε)dx+ϕ(xε)dx=1superscriptsubscript𝑖𝑠𝑖1𝑠italic-ϕ𝑥𝜀differential-d𝑥superscriptsubscriptitalic-ϕ𝑥𝜀differential-d𝑥1\int_{is}^{(i+1)s}\phi(x-\varepsilon)\mathrm{d}x\leq\int_{-\infty}^{+\infty}% \phi(x-\varepsilon)\mathrm{d}x=1∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) roman_d italic_x ≤ ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) roman_d italic_x = 1 and then uses that by our choice of parameters: first imaxs=(Clog(1/η))k/2subscript𝑖𝑠superscript𝐶1𝜂𝑘2i_{\max}s=(C\log(1/\eta))^{k/2}italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s = ( italic_C roman_log ( 1 / italic_η ) ) start_POSTSUPERSCRIPT italic_k / 2 end_POSTSUPERSCRIPT, and finally s=η2/(k3kC2k2logk2(1/η))𝑠superscript𝜂2superscript𝑘3𝑘superscript𝐶2superscript𝑘2superscriptsuperscript𝑘21𝜂s=\eta^{2}/(k^{3k}C^{2k^{2}}\log^{k^{2}}(1/\eta))italic_s = italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( italic_k start_POSTSUPERSCRIPT 3 italic_k end_POSTSUPERSCRIPT italic_C start_POSTSUPERSCRIPT 2 italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( 1 / italic_η ) ). This completes the proof that |𝐄[i=imaximaxIi]|ηmuch-less-than𝐄superscriptsubscript𝑖subscript𝑖subscript𝑖subscript𝐼𝑖𝜂|\operatorname*{\mathbf{E}}[\sum_{i=-i_{\max}}^{i_{\max}}I_{i}]|\ll\eta| bold_E [ ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] | ≪ italic_η.

We now show the non-trivial probability claim. By the Chernoff-Hoeffding bound, with probability at least 1τ1𝜏1-\tau1 - italic_τ, it holds |i=imaximaxIi𝐄[i=imaximaxIi]|Δimaxlog(1/τ)less-than-or-similar-tosuperscriptsubscript𝑖subscript𝑖subscript𝑖subscript𝐼𝑖𝐄superscriptsubscript𝑖subscript𝑖subscript𝑖subscript𝐼𝑖Δsubscript𝑖1𝜏|\sum_{i=-i_{\max}}^{i_{\max}}I_{i}-\operatorname*{\mathbf{E}}[\sum_{i=-i_{% \max}}^{i_{\max}}I_{i}]|\lesssim\Delta\sqrt{i_{\max}\log(1/\tau)}| ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_E [ ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] | ≲ roman_Δ square-root start_ARG italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT roman_log ( 1 / italic_τ ) end_ARG where ΔΔ\Deltaroman_Δ is any value such that |Ii|Δsubscript𝐼𝑖Δ|I_{i}|\leq\Delta| italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ roman_Δ with probability one. In our case, we have that |Ii|=|is(i+1)sxtϕ(xε)dx|ssupxxtex2sttskksubscript𝐼𝑖superscriptsubscript𝑖𝑠𝑖1𝑠superscript𝑥𝑡italic-ϕ𝑥𝜀differential-d𝑥𝑠subscriptsupremum𝑥superscript𝑥𝑡superscript𝑒superscript𝑥2𝑠superscript𝑡𝑡𝑠superscript𝑘𝑘|I_{i}|=|\int_{is}^{(i+1)s}x^{t}\phi(x-\varepsilon)\mathrm{d}x|\leq s\cdot\sup% _{x\in\mathbb{R}}x^{t}e^{-x^{2}}\leq st^{t}\leq sk^{k}| italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | = | ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) roman_d italic_x | ≤ italic_s ⋅ roman_sup start_POSTSUBSCRIPT italic_x ∈ blackboard_R end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ≤ italic_s italic_t start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≤ italic_s italic_k start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT. We also use τ=0.1/k𝜏0.1𝑘\tau=0.1/kitalic_τ = 0.1 / italic_k because we want the conclusion to hold simultaneously over all t=0,,k𝑡0𝑘t=0,\ldots,kitalic_t = 0 , … , italic_k. Using these parameters, and our definitions for s𝑠sitalic_s and imaxsubscript𝑖i_{\max}italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT, the application of Chernoff-Hoeffding bound yields that |i=imaximaxIi𝐄[i=imaximaxIi]|kksimaxlogkkksimaxslogks(Clog(1/η))k/4kklogkηsuperscriptsubscript𝑖subscript𝑖subscript𝑖subscript𝐼𝑖𝐄superscriptsubscript𝑖subscript𝑖subscript𝑖subscript𝐼𝑖superscript𝑘𝑘𝑠subscript𝑖𝑘superscript𝑘𝑘𝑠subscript𝑖𝑠𝑘𝑠superscript𝐶1𝜂𝑘4superscript𝑘𝑘𝑘𝜂|\sum_{i=-i_{\max}}^{i_{\max}}I_{i}-\operatorname*{\mathbf{E}}[\sum_{i=-i_{% \max}}^{i_{\max}}I_{i}]|\leq k^{k}s\sqrt{i_{\max}\log k}\leq k^{k}\sqrt{s}% \sqrt{i_{\max}s}\sqrt{\log k}\leq\sqrt{s}(C\log(1/\eta))^{k/4}k^{k}\sqrt{\log k% }\leq\eta| ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_E [ ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] | ≤ italic_k start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_s square-root start_ARG italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT roman_log italic_k end_ARG ≤ italic_k start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT square-root start_ARG italic_s end_ARG square-root start_ARG italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s end_ARG square-root start_ARG roman_log italic_k end_ARG ≤ square-root start_ARG italic_s end_ARG ( italic_C roman_log ( 1 / italic_η ) ) start_POSTSUPERSCRIPT italic_k / 4 end_POSTSUPERSCRIPT italic_k start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT square-root start_ARG roman_log italic_k end_ARG ≤ italic_η.

We now move to the second (and easier) part of the proof regarding the normalizing factor. We want to show that |ϕ(xε)(1δf~(x))dxϕ(xε)(1δf(x))dx|ηmuch-less-thansubscriptitalic-ϕ𝑥𝜀1𝛿~𝑓𝑥differential-d𝑥subscriptitalic-ϕ𝑥𝜀1𝛿𝑓𝑥differential-d𝑥𝜂|\int_{\mathbb{R}}\phi(x-\varepsilon)(1-\delta\tilde{f}(x))\mathrm{d}x-\int_{% \mathbb{R}}\phi(x-\varepsilon)(1-\delta f(x))\mathrm{d}x|\ll\eta| ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x - ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ italic_f ( italic_x ) ) roman_d italic_x | ≪ italic_η. As before, the parts of the integral in (,imaxs)[(imax+1)s,+)subscript𝑖𝑠subscript𝑖1𝑠(-\infty,-i_{\max}s)\cup[(i_{\max}+1)s,+\infty)( - ∞ , - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT italic_s ) ∪ [ ( italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT + 1 ) italic_s , + ∞ ) do not mater (the error term r𝑟ritalic_r has |r|ηmuch-less-than𝑟𝜂|r|\ll\eta| italic_r | ≪ italic_η below) :

+ϕ(xε)(1δf~(x))dxsuperscriptsubscriptitalic-ϕ𝑥𝜀1𝛿~𝑓𝑥differential-d𝑥\displaystyle\int\limits_{-\infty}^{+\infty}\phi(x-\varepsilon)(1-\delta\tilde% {f}(x))\mathrm{d}x∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x +ϕ(xε)(1δf(x))dxsuperscriptsubscriptitalic-ϕ𝑥𝜀1𝛿𝑓𝑥differential-d𝑥\displaystyle-\int\limits_{-\infty}^{+\infty}\phi(x-\varepsilon)(1-\delta f(x)% )\mathrm{d}x- ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ italic_f ( italic_x ) ) roman_d italic_x
r+i=imaximaxis(i+1)sϕ(xε)δ(f(x)f~(x))dx.absent𝑟superscriptsubscript𝑖subscript𝑖subscript𝑖superscriptsubscript𝑖𝑠𝑖1𝑠italic-ϕ𝑥𝜀𝛿𝑓𝑥~𝑓𝑥differential-d𝑥\displaystyle\leq r+\sum_{i=-i_{\max}}^{i_{\max}}\int\limits_{is}^{(i+1)s}\phi% (x-\varepsilon)\delta(f(x)-\tilde{f}(x))\mathrm{d}x\;.≤ italic_r + ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) italic_δ ( italic_f ( italic_x ) - over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x .

Re-define Ii:=is(i+1)sϕ(xε)δ(f(x)f~(x))dxassignsubscript𝐼𝑖superscriptsubscript𝑖𝑠𝑖1𝑠italic-ϕ𝑥𝜀𝛿𝑓𝑥~𝑓𝑥differential-d𝑥I_{i}:=\int\limits_{is}^{(i+1)s}\phi(x-\varepsilon)\delta(f(x)-\tilde{f}(x))% \mathrm{d}xitalic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := ∫ start_POSTSUBSCRIPT italic_i italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i + 1 ) italic_s end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) italic_δ ( italic_f ( italic_x ) - over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x. By definition of f~~𝑓\tilde{f}over~ start_ARG italic_f end_ARG, 𝐄[Ii]=0𝐄subscript𝐼𝑖0\operatorname*{\mathbf{E}}[I_{i}]=0bold_E [ italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = 0 for all the pieces i=imax,,imax𝑖subscript𝑖subscript𝑖i=-i_{\max},\ldots,i_{\max}italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT. An application of Chernoff-Hoeffding bounds similar to the previous one also shows that |i=imaximaxIi|ηmuch-less-thansuperscriptsubscript𝑖subscript𝑖subscript𝑖subscript𝐼𝑖𝜂|\sum_{i=-i_{\max}}^{i_{\max}}I_{i}|\ll\eta| ∑ start_POSTSUBSCRIPT italic_i = - italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≪ italic_η with large constant probability.

The proof is now concluded by noting that

+xtPf~(x)dxsuperscriptsubscriptsuperscript𝑥𝑡subscript𝑃~𝑓𝑥differential-d𝑥\displaystyle\int\limits_{-\infty}^{+\infty}x^{t}P_{\tilde{f}}(x)\mathrm{d}x∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT over~ start_ARG italic_f end_ARG end_POSTSUBSCRIPT ( italic_x ) roman_d italic_x =+xtϕ(xε)(1δf~(x))dx+ϕ(xε)(1δf~(x))dx=+xtϕ(xε)(1δf~(x))dx±η/100+ϕ(xε)(1δf~(x))dx±η/100absentsuperscriptsubscriptsuperscript𝑥𝑡italic-ϕ𝑥𝜀1𝛿~𝑓𝑥differential-d𝑥superscriptsubscriptitalic-ϕ𝑥𝜀1𝛿~𝑓𝑥differential-d𝑥plus-or-minussuperscriptsubscriptsuperscript𝑥𝑡italic-ϕ𝑥𝜀1𝛿~𝑓𝑥differential-d𝑥𝜂100plus-or-minussuperscriptsubscriptitalic-ϕ𝑥𝜀1𝛿~𝑓𝑥differential-d𝑥𝜂100\displaystyle=\frac{\int\limits_{-\infty}^{+\infty}x^{t}\phi(x-\varepsilon)(1-% \delta\tilde{f}(x))\mathrm{d}x}{\int\limits_{-\infty}^{+\infty}\phi(x-% \varepsilon)(1-\delta\tilde{f}(x))\mathrm{d}x}=\frac{\int\limits_{-\infty}^{+% \infty}x^{t}\phi(x-\varepsilon)(1-\delta\tilde{f}(x))\mathrm{d}x\pm\eta/100}{% \int\limits_{-\infty}^{+\infty}\phi(x-\varepsilon)(1-\delta\tilde{f}(x))% \mathrm{d}x\pm\eta/100}= divide start_ARG ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x end_ARG start_ARG ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x end_ARG = divide start_ARG ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x ± italic_η / 100 end_ARG start_ARG ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x ± italic_η / 100 end_ARG
=+xtϕ(xε)(1δf~(x))dx±η/100(1±η/100)+ϕ(xε)(1δf~(x))dx=(1±η/2)+xtPf(x)dx±η2,absentplus-or-minussuperscriptsubscriptsuperscript𝑥𝑡italic-ϕ𝑥𝜀1𝛿~𝑓𝑥differential-d𝑥𝜂100plus-or-minus1𝜂100superscriptsubscriptitalic-ϕ𝑥𝜀1𝛿~𝑓𝑥differential-d𝑥plus-or-minusplus-or-minus1𝜂2superscriptsubscriptsuperscript𝑥𝑡subscript𝑃𝑓𝑥differential-d𝑥𝜂2\displaystyle=\frac{\int\limits_{-\infty}^{+\infty}x^{t}\phi(x-\varepsilon)(1-% \delta\tilde{f}(x))\mathrm{d}x\pm\eta/100}{(1\pm\eta/100)\int\limits_{-\infty}% ^{+\infty}\phi(x-\varepsilon)(1-\delta\tilde{f}(x))\mathrm{d}x}=(1\pm\eta/2)% \int\limits_{-\infty}^{+\infty}x^{t}P_{{f}}(x)\mathrm{d}x\pm\frac{\eta}{2}\;,= divide start_ARG ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x ± italic_η / 100 end_ARG start_ARG ( 1 ± italic_η / 100 ) ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x end_ARG = ( 1 ± italic_η / 2 ) ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x ) roman_d italic_x ± divide start_ARG italic_η end_ARG start_ARG 2 end_ARG , (24)

where the second line used that the normalizing factor is +ϕ(xε)(1δf~(x))dx=Ω(1)superscriptsubscriptitalic-ϕ𝑥𝜀1𝛿~𝑓𝑥differential-d𝑥Ω1\int\limits_{-\infty}^{+\infty}\phi(x-\varepsilon)(1-\delta\tilde{f}(x))% \mathrm{d}x=\Omega(1)∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_ϕ ( italic_x - italic_ε ) ( 1 - italic_δ over~ start_ARG italic_f end_ARG ( italic_x ) ) roman_d italic_x = roman_Ω ( 1 ). Finally, if we used η/kk𝜂superscript𝑘𝑘\eta/k^{k}italic_η / italic_k start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT in place of η𝜂\etaitalic_η everywhere from the beginning of this proof, we could make the RHS of (24) at most +xtPf(x)dx±ηplus-or-minussuperscriptsubscriptsuperscript𝑥𝑡subscript𝑃𝑓𝑥differential-d𝑥𝜂\int\limits_{-\infty}^{+\infty}x^{t}P_{{f}}(x)\mathrm{d}x\pm\eta∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x ) roman_d italic_x ± italic_η (this is because +xtPf(x)dxsuperscriptsubscriptsuperscript𝑥𝑡subscript𝑃𝑓𝑥differential-d𝑥\int\limits_{-\infty}^{+\infty}x^{t}P_{{f}}(x)\mathrm{d}x∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + ∞ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x ) roman_d italic_x is the same as the Gaussian moments). ∎

References

  • [BBH+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT21] M. Brennan, G. Bresler, S. B. Hopkins, J. Li, and T. Schramm. Statistical query algorithms and low degree tests are almost equivalent. In Conference on Learning Theory, pages 774–774. PMLR, 2021.
  • [BE14] N. Balakrishnan and C. Erhard. The art of progressive censoring. Statistics for industry and technology, 2014.
  • [Ber60] D. Bernoulli. Essai d’une nouvelle analyse de la mortalité causée par la petite vérole, et des avantages de l’inoculation pour la prévenir. pages 1–14, 1760.
  • [CDIZ23] Y. Cherapanamjeri, C. Daskalakis, A. Ilyas, and M. Zampetakis. What makes a good fisherman? linear regression under self-selection bias. In Symposium on Theory of Computing, pages 1699–1712, 2023.
  • [CLL22] S. Chen, J. Li, and Y. Li. Learning (very) simple generative models is hard. In NeurIPS, 2022.
  • [Coh91] A. C. Cohen. Truncated and censored samples: theory and applications. CRC press, 1991.
  • [DGTZ18] C. Daskalakis, T. Gouleakis, C. Tzamos, and M. Zampetakis. Efficient statistics, in high dimensions, from truncated samples. In Foundations of Computer Science (FOCS), pages 639–649, 2018.
  • [DGTZ19] C. Daskalakis, T. Gouleakis, C. Tzamos, and M. Zampetakis. Computationally and statistically efficient truncated regression. In Conference on Learning Theory, pages 955–960. PMLR, 2019.
  • [DK22] I. Diakonikolas and D. Kane. Near-optimal statistical query hardness of learning halfspaces with massart noise. In Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 4258–4282. PMLR, 2022. Full version available at https://arxiv.longhoe.net/abs/2012.09720.
  • [DK23] I. Diakonikolas and D. M. Kane. Algorithmic high-dimensional robust statistics. Cambridge university press, 2023.
  • [DKK+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT22] I. Diakonikolas, D. M. Kane, V. Kontonis, C. Tzamos, and N. Zarifis. Learning general halfspaces with general massart noise under the gaussian distribution. In STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 874–885, 2022. Full version available at https://arxiv.longhoe.net/abs/2108.08767.
  • [DKKZ20] I. Diakonikolas, D. M. Kane, V. Kontonis, and N. Zarifis. Algorithms and SQ lower bounds for PAC learning one-hidden-layer relu networks. In Conference on Learning Theory, COLT 2020, volume 125 of Proceedings of Machine Learning Research, pages 1514–1539. PMLR, 2020.
  • [DKP+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT21] I. Diakonikolas, D. M. Kane, A. Pensia, T. Pittas, and A. Stewart. Statistical query lower bounds for list-decodable linear regression. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, pages 3191–3204, 2021.
  • [DKPZ21] I. Diakonikolas, D. M. Kane, T. Pittas, and N. Zarifis. The optimality of polynomial regression for agnostic learning under gaussian marginals in the sq model. In Conference on Learning Theory, pages 1552–1584. PMLR, 2021.
  • [DKPZ23] I. Diakonikolas, D. M. Kane, T. Pittas, and N. Zarifis. SQ lower bounds for learning mixtures of separated and bounded covariance gaussians. In The Thirty Sixth Annual Conference on Learning Theory, COLT 2023, volume 195 of Proceedings of Machine Learning Research, pages 2319–2349. PMLR, 2023.
  • [DKRS23] I. Diakonikolas, D. Kane, L. Ren, and Y. Sun. Sq lower bounds for non-gaussian component analysis with weaker assumptions. In NeurIPS, volume 36, pages 4199–4212, 2023.
  • [DKS17] I. Diakonikolas, D. M. Kane, and A. Stewart. Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, pages 73–84, 2017. Full version at http://arxiv.longhoe.net/abs/1611.03473.
  • [DKS18] I. Diakonikolas, D. M. Kane, and A. Stewart. List-decodable robust mean estimation and learning mixtures of spherical gaussians. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, pages 1047–1060, 2018. Full version available at https://arxiv.longhoe.net/abs/1711.07211.
  • [DKS19] I. Diakonikolas, W. Kong, and A. Stewart. Efficient algorithms and lower bounds for robust linear regression. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, pages 2745–2754, 2019.
  • [DKS23] I. Diakonikolas, D. M. Kane, and Y. Sun. SQ lower bounds for learning mixtures of linear classifiers. CoRR, abs/2310.11876, 2023. Conference version in NeurIPS 2023.
  • [DKTZ21] C. Daskalakis, V. Kontonis, C. Tzamos, and M. Zampetakis. A statistical taylor theorem and extrapolation of truncated densities. In Conference on Learning Theory, pages 1395–1398. PMLR, 2021.
  • [DKZ20] I. Diakonikolas, D. Kane, and N. Zarifis. Near-optimal SQ lower bounds for agnostically learning halfspaces and relus under gaussian marginals. In on Neural Information Processing Systems, 2020.
  • [DRZ20] C. Daskalakis, D. Rohatgi, and M. Zampetakis. Truncated linear regression in high dimensions. Advances in Neural Information Processing Systems, 33:10338–10347, 2020.
  • [DSYZ21] C. Daskalakis, P. Stefanou, R. Yao, and M. Zampetakis. Efficient truncated linear regression with unknown noise variance. Advances in Neural Information Processing Systems, 34:1952–1963, 2021.
  • [FGR+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT13] V. Feldman, E. Grigorescu, L. Reyzin, S. Vempala, and Y. Xiao. Statistical algorithms and a lower bound for detecting planted cliques. In Proceedings of STOC’13, pages 655–664, 2013. Full version in Journal of the ACM, 2017.
  • [Fis31] R. A. Fisher. Properties and applications of hh functions. Mathematical tables, 1(815-852):2, 1931.
  • [FKT20] D. Fotakis, A. Kalavasis, and C. Tzamos. Efficient parameter estimation of truncated boolean product distributions. In Conference on Learning Theory, pages 1586–1600. PMLR, 2020.
  • [Gal98] F. Galton. An examination into the registered speeds of american trotting horses, with remarks on their value as hereditary data. Proceedings of the Royal Society of London, 62(379-387):310–315, 1898.
  • [GGK20] S. Goel, A. Gollakota, and A. R. Klivans. Statistical-query lower bounds via functional gradients. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 2020.
  • [Hop18] S. B. Hopkins. Statistical inference and the sum of squares method. PhD thesis, Cornell University, 2018.
  • [IZD20] A. Ilyas, M. Zampetakis, and C. Daskalakis. A theoretical and practical framework for regression and classification from truncated samples. In International Conference on Artificial Intelligence and Statistics, pages 4463–4473. PMLR, 2020.
  • [Kan11] D. M. Kane. The gaussian surface area and noise sensitivity of degree-d polynomial threshold functions. Computational Complexity, 20(2):389–412, 2011.
  • [Kea98] M. J. Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM, 45(6):983–1006, 1998.
  • [KOS08] A. Klivans, R. O’Donnell, and R. Servedio. Learning geometric concepts via Gaussian surface area. In Proc. 49th IEEE Symposium on Foundations of Computer Science (FOCS), pages 541–550, 2008.
  • [KTZ19] V. Kontonis, C. Tzamos, and M. Zampetakis. Efficient truncated statistics with unknown truncation. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 1578–1595. IEEE, 2019.
  • [KWB22] D. Kunisky, A. S. Wein, and A. S. Bandeira. Notes on computational hardness of hypothesis testing: Predictions using the low-degree likelihood ratio. In ISAAC Congress (International Society for Analysis, its Applications and Computation), pages 1–50. Springer, 2022.
  • [Lee14] A. Lee. Table of the gaussian" tail" functions; when the" tail" is larger than the body. Biometrika, 10(2/3):208–214, 1914.
  • [LWZ23] J. Lee, A. Wibisono, and M. Zampetakis. Learning exponential families from truncated samples. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • [Nee14] J. Neeman. Testing surface area with arbitrary accuracy. In Symposium on Theory of Computing, STOC 2014, 2014, pages 393–397. ACM, 2014.
  • [Pea02] K. Pearson. On the systematic fitting of curves to observations and measurements: part ii. Biometrika, 2(1):1–23, 1902.
  • [PL08] K. Pearson and A. Lee. On the generalised probable error in multiple normal correlation. Biometrika, 6(1):59–68, 1908.
  • [Ple21] O. Plevrakis. Learning from censored and dependent data: The case of linear dynamics. In Conference on Learning Theory, pages 3771–3787. PMLR, 2021.
  • [Sch86] H. Schneider. Truncated and censored samples from normal populations. Marcel Dekker, Inc., 1986.
  • [Sze67] G. Szegö. Orthogonal Polynomials. Number τ𝜏\tauitalic_τ. 23 in American Mathematical Society colloquium publications. American Mathematical Society, 1967.

Appendix

Appendix A Additional Preliminaries

Additional Notation

We use abless-than-or-similar-to𝑎𝑏a\lesssim bitalic_a ≲ italic_b to denote that there exists an absolute universal constant C>0𝐶0C>0italic_C > 0 (independent of the variables or parameters on which a𝑎aitalic_a and b𝑏bitalic_b depend) such that aCb𝑎𝐶𝑏a\leq Cbitalic_a ≤ italic_C italic_b. We write abmuch-less-than𝑎𝑏a\ll bitalic_a ≪ italic_b to denote that αcb𝛼𝑐𝑏\alpha\leq cbitalic_α ≤ italic_c italic_b for a sufficiently small absolute constant c>0𝑐0c>0italic_c > 0.

Legendre Polynomials

In this work, we make use of the Legendre Polynomials which are orthogonal polynomials over [1,1]11[-1,1][ - 1 , 1 ]. Some of their properties are:

Fact A.1 ([Sze67]).

The Legendre polynomials Pksubscript𝑃𝑘P_{k}italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for k𝑘k\in\mathbb{Z}italic_k ∈ blackboard_Z, satisfy the following properties:

  1. 1.

    Pksubscript𝑃𝑘P_{k}italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a k𝑘kitalic_k-degree polynomial and P0(x)=1subscript𝑃0𝑥1P_{0}(x)=1italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) = 1 and P1(x)=xsubscript𝑃1𝑥𝑥P_{1}(x)=xitalic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) = italic_x.

  2. 2.

    11Pi(x)Pj(x)dx=2/(2i+1)𝟙{i=j}superscriptsubscript11subscript𝑃𝑖𝑥subscript𝑃𝑗𝑥differential-d𝑥22𝑖11𝑖𝑗\int_{-1}^{1}P_{i}(x)P_{j}(x)\mathrm{d}x=2/(2i+1)\mathds{1}\{i=j\}∫ start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_x ) roman_d italic_x = 2 / ( 2 italic_i + 1 ) blackboard_1 { italic_i = italic_j }, for all i,j𝑖𝑗i,j\in\mathbb{Z}italic_i , italic_j ∈ blackboard_Z.

  3. 3.

    |Pk(x)|1subscript𝑃𝑘𝑥1|P_{k}(x)|\leq 1| italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) | ≤ 1 for all |x|1𝑥1|x|\leq 1| italic_x | ≤ 1.

  4. 4.

    Pk(x)=(1)kPk(x)subscript𝑃𝑘𝑥superscript1𝑘subscript𝑃𝑘𝑥P_{k}(x)=(-1)^{k}P_{k}(-x)italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) = ( - 1 ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( - italic_x ).

  5. 5.

    Pk(x)=2ki=1k/2(ki)(2k2ik)xk2isubscript𝑃𝑘𝑥superscript2𝑘superscriptsubscript𝑖1𝑘2binomial𝑘𝑖binomial2𝑘2𝑖𝑘superscript𝑥𝑘2𝑖P_{k}(x)=2^{-k}\sum_{i=1}^{\lceil k/2\rceil}\binom{k}{i}\binom{2k-2i}{k}x^{k-2i}italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) = 2 start_POSTSUPERSCRIPT - italic_k end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌈ italic_k / 2 ⌉ end_POSTSUPERSCRIPT ( FRACOP start_ARG italic_k end_ARG start_ARG italic_i end_ARG ) ( FRACOP start_ARG 2 italic_k - 2 italic_i end_ARG start_ARG italic_k end_ARG ) italic_x start_POSTSUPERSCRIPT italic_k - 2 italic_i end_POSTSUPERSCRIPT.

Additional Background on the SQ Model

We now record additional definitions and facts from [FGR+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT13] that are relevant to the SQ model.

Definition A.2 (Pairwise Correlation).

The pairwise correlation of two distributions with probability density functions D1,D2:d+normal-:subscript𝐷1subscript𝐷2normal-→superscript𝑑subscriptD_{1},D_{2}:\mathbb{R}^{d}\to\mathbb{R}_{+}italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT with respect to a distribution with density D:d+normal-:𝐷normal-→superscript𝑑subscriptD:\mathbb{R}^{d}\to\mathbb{R}_{+}italic_D : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, where the support of D𝐷Ditalic_D contains the supports of D1subscript𝐷1D_{1}italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and D2subscript𝐷2D_{2}italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, is defined as χD(D1,D2)=dD1(𝐱)D2(𝐱)/D(𝐱)d𝐱1subscript𝜒𝐷subscript𝐷1subscript𝐷2subscriptsuperscript𝑑subscript𝐷1𝐱subscript𝐷2𝐱𝐷𝐱differential-d𝐱1\chi_{D}(D_{1},D_{2})=\int_{\mathbb{R}^{d}}D_{1}(\mathbf{x})D_{2}(\mathbf{x})/% D(\mathbf{x})\,\mathrm{d}\mathbf{x}-1italic_χ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_x ) italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_x ) / italic_D ( bold_x ) roman_d bold_x - 1.

Definition A.3 (χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-divergence).

The χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-divergence between D1,D2:d+normal-:subscript𝐷1subscript𝐷2normal-→superscript𝑑subscriptD_{1},D_{2}:\mathbb{R}^{d}\to\mathbb{R}_{+}italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT is defined as χ2(D1,D2)=dD12(𝐱)/D2(𝐱)d𝐱1superscript𝜒2subscript𝐷1subscript𝐷2subscriptsuperscript𝑑superscriptsubscript𝐷12𝐱subscript𝐷2𝐱differential-d𝐱1\chi^{2}(D_{1},D_{2})=\int_{\mathbb{R}^{d}}D_{1}^{2}(\mathbf{x})/D_{2}(\mathbf% {x})\,\mathrm{d}\mathbf{x}-1italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( bold_x ) / italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_x ) roman_d bold_x - 1.

Definition A.4.

We say that a set of s𝑠sitalic_s distributions 𝒟={D1,,Ds}𝒟subscript𝐷1normal-…subscript𝐷𝑠\mathcal{D}=\{D_{1},\ldots,D_{s}\}caligraphic_D = { italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT } is (γ,β)𝛾𝛽(\gamma,\beta)( italic_γ , italic_β )-correlated relative to a distribution D𝐷Ditalic_D if |χD(Di,Dj)|γsubscript𝜒𝐷subscript𝐷𝑖subscript𝐷𝑗𝛾|\chi_{D}(D_{i},D_{j})|\leq\gamma| italic_χ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) | ≤ italic_γ for all ij𝑖𝑗i\neq jitalic_i ≠ italic_j, and |χD(Di,Dj)|βsubscript𝜒𝐷subscript𝐷𝑖subscript𝐷𝑗𝛽|\chi_{D}(D_{i},D_{j})|\leq\beta| italic_χ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) | ≤ italic_β for i=j𝑖𝑗i=jitalic_i = italic_j.

Definition A.5 (Decision Problem over Distributions).

Let D𝐷Ditalic_D be a fixed distribution and 𝒟𝒟\mathcal{D}caligraphic_D be a distribution family. We denote by (𝒟,D)𝒟𝐷\mathcal{B}(\mathcal{D},D)caligraphic_B ( caligraphic_D , italic_D ) the decision (or hypothesis testing) problem in which the input distribution Dsuperscript𝐷normal-′D^{\prime}italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is promised to satisfy either (a) D=Dsuperscript𝐷normal-′𝐷D^{\prime}=Ditalic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_D or (b) D𝒟superscript𝐷normal-′𝒟D^{\prime}\in\mathcal{D}italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_D, and the goal is to distinguish between the two cases.

Definition A.6 (Statistical Query Dimension).

Let β,γ>0𝛽𝛾0\beta,\gamma>0italic_β , italic_γ > 0. Consider a decision problem (𝒟,D)𝒟𝐷\mathcal{B}(\mathcal{D},D)caligraphic_B ( caligraphic_D , italic_D ), where D𝐷Ditalic_D is a fixed distribution and 𝒟𝒟\mathcal{D}caligraphic_D is a family of distributions. Define s𝑠sitalic_s to be the maximum integer such that there exists a finite set of distributions 𝒟D𝒟subscript𝒟𝐷𝒟\mathcal{D}_{D}\subseteq\mathcal{D}caligraphic_D start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ⊆ caligraphic_D such that 𝒟Dsubscript𝒟𝐷\mathcal{D}_{D}caligraphic_D start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT is (γ,β)𝛾𝛽(\gamma,\beta)( italic_γ , italic_β )-correlated relative to D𝐷Ditalic_D and |𝒟D|s.subscript𝒟𝐷𝑠|\mathcal{D}_{D}|\geq s.| caligraphic_D start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT | ≥ italic_s . The Statistical Query dimension with pairwise correlations (γ,β)𝛾𝛽(\gamma,\beta)( italic_γ , italic_β ) of \mathcal{B}caligraphic_B is defined as s𝑠sitalic_s and denoted as SD(,γ,β)normal-SD𝛾𝛽\mathrm{SD}(\mathcal{B},\gamma,\beta)roman_SD ( caligraphic_B , italic_γ , italic_β ).

Lemma A.7 (Corollary 3.12 in [FGR+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT13]).

Let (𝒟,D)𝒟𝐷\mathcal{B}(\mathcal{D},D)caligraphic_B ( caligraphic_D , italic_D ) be a decision problem. For γ,β>0𝛾𝛽0\gamma,\beta>0italic_γ , italic_β > 0, let s=SD(,γ,β)𝑠normal-SD𝛾𝛽s=\mathrm{SD}(\mathcal{B},\gamma,\beta)italic_s = roman_SD ( caligraphic_B , italic_γ , italic_β ). For any γ>0,superscript𝛾normal-′0\gamma^{\prime}>0,italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > 0 , any SQ algorithm for \mathcal{B}caligraphic_B requires queries of tolerance at most γ+γ𝛾superscript𝛾normal-′\sqrt{\gamma+\gamma^{\prime}}square-root start_ARG italic_γ + italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG or makes at least sγ/(βγ)𝑠superscript𝛾normal-′𝛽𝛾s\gamma^{\prime}/(\beta-\gamma)italic_s italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT / ( italic_β - italic_γ ) queries.

We need the following result from [DKPZ21] that upper bounds the correlation between two such distributions.

Lemma A.8 (Corollary 2.4 in [DKPZ21]).

Let A𝐴Aitalic_A be a distribution over msuperscript𝑚\mathbb{R}^{m}blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT such that the first k𝑘kitalic_k moments of A𝐴Aitalic_A match the corresponding moments of 𝒩(𝟎,𝐈m)𝒩0subscript𝐈𝑚\mathcal{N}(\mathbf{0},\mathbf{I}_{m})caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ). For matrices 𝐔,𝐕m×d𝐔𝐕superscript𝑚𝑑\mathbf{U},\mathbf{V}\in\mathbb{R}^{m\times d}bold_U , bold_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d end_POSTSUPERSCRIPT such that 𝐔𝐔=𝐕𝐕=𝐈msuperscript𝐔𝐔topsuperscript𝐕𝐕topsubscript𝐈𝑚\mathbf{U}\mathbf{U}^{\top}=\mathbf{V}\mathbf{V}^{\top}=\mathbf{I}_{m}bold_UU start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = bold_VV start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, define PA,𝐔subscript𝑃𝐴𝐔P_{A,\mathbf{U}}italic_P start_POSTSUBSCRIPT italic_A , bold_U end_POSTSUBSCRIPT and PA,𝐕subscript𝑃𝐴𝐕P_{A,\mathbf{V}}italic_P start_POSTSUBSCRIPT italic_A , bold_V end_POSTSUBSCRIPT to be distributions over dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT according to Definition 2.3. Then, the following holds: |χ𝒩(𝟎,𝐈m)(PA,𝐔,PA,𝐕)|𝐔𝐕opk+1χ2(A,𝒩(𝟎,𝐈m))subscript𝜒𝒩0subscript𝐈𝑚subscript𝑃𝐴𝐔subscript𝑃𝐴𝐕superscriptsubscriptnormsuperscript𝐔𝐕topop𝑘1superscript𝜒2𝐴𝒩0subscript𝐈𝑚|\chi_{\mathcal{N}(\mathbf{0},\mathbf{I}_{m})}(P_{A,\mathbf{U}},P_{A,\mathbf{V% }})|\leq\|\mathbf{U}\mathbf{V}^{\top}\|_{\textnormal{op}}^{k+1}\chi^{2}(A,% \mathcal{N}(\mathbf{0},\mathbf{I}_{m}))| italic_χ start_POSTSUBSCRIPT caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_A , bold_U end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_A , bold_V end_POSTSUBSCRIPT ) | ≤ ∥ bold_UV start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ).

A.1 Proof of Fact 2.4

We restate and prove the following fact.

Fact A.9.

Let d,k𝑑𝑘d,k\in\mathbb{Z}italic_d , italic_k ∈ blackboard_Z and m<d1/10𝑚superscript𝑑110m<d^{1/10}italic_m < italic_d start_POSTSUPERSCRIPT 1 / 10 end_POSTSUPERSCRIPT. Let A𝐴Aitalic_A be a distribution over msuperscript𝑚\mathbb{R}^{m}blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT such that the first k𝑘kitalic_k moments of A𝐴Aitalic_A match the corresponding moments of 𝒩(𝟎,𝐈m)𝒩0subscript𝐈𝑚\mathcal{N}(\mathbf{0},\mathbf{I}_{m})caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ). Define the set 𝒟𝒟\mathcal{D}caligraphic_D of distributions containing distributions constructed as follows: for matrices 𝐔m×d𝐔superscript𝑚𝑑\mathbf{U}\in\mathbb{R}^{m\times d}bold_U ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d end_POSTSUPERSCRIPT such that 𝐔𝐔=𝐈msuperscript𝐔𝐔topsubscript𝐈𝑚\mathbf{U}\mathbf{U}^{\top}=\mathbf{I}_{m}bold_UU start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, define PA,𝐔subscript𝑃𝐴𝐔P_{A,\mathbf{U}}italic_P start_POSTSUBSCRIPT italic_A , bold_U end_POSTSUBSCRIPT to be distributions over dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT according to Definition 2.3. Then, any statistical query algorithm that solves the decision problem (𝒟,𝒩(𝟎,𝐈d))𝒟𝒩0subscript𝐈𝑑\mathcal{B}(\mathcal{D},\mathcal{N}(\mathbf{0},\mathbf{I}_{d}))caligraphic_B ( caligraphic_D , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ), requires either 2dΩ(1)superscript2superscript𝑑normal-Ω12^{d^{\Omega(1)}}2 start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT roman_Ω ( 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT many queries, or performs at least one query with tolerance dΩ(k)χ2(A,𝒩(𝟎,𝐈m))superscript𝑑normal-Ω𝑘superscript𝜒2𝐴𝒩0subscript𝐈𝑚d^{-\Omega(k)}\chi^{2}(A,\mathcal{N}(\mathbf{0},\mathbf{I}_{m}))italic_d start_POSTSUPERSCRIPT - roman_Ω ( italic_k ) end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ).

Proof.

Recall the definition of decision problems (Definition A.5). Let the decision problem (𝒟,D)𝒟𝐷\mathcal{B}(\mathcal{D},D)caligraphic_B ( caligraphic_D , italic_D ) where D=𝒩(𝟎,𝐈d)𝐷𝒩0subscript𝐈𝑑D=\mathcal{N}(\mathbf{0},\mathbf{I}_{d})italic_D = caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) and 𝒟𝒟\mathcal{D}caligraphic_D is defined as the in the alternative hypothesis class above. We now lower bound the SQ dimension (Definition A.6) of (𝒟,D)𝒟𝐷\mathcal{B}(\mathcal{D},D)caligraphic_B ( caligraphic_D , italic_D ). Let Ssuperscript𝑆S^{\prime}italic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be the set of matrices from the fact below.

Fact A.10 (See, e.g., Lemma 17 in [DKPZ21] ).

Let m,d𝑚𝑑m,d\in\mathbb{N}italic_m , italic_d ∈ blackboard_N with m<d1/10𝑚superscript𝑑110m<d^{1/10}italic_m < italic_d start_POSTSUPERSCRIPT 1 / 10 end_POSTSUPERSCRIPT. There exists a set S𝑆Sitalic_S of 2dΩ(1)superscript2superscript𝑑normal-Ω12^{d^{\Omega(1)}}2 start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT roman_Ω ( 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT matrices in m×dsuperscript𝑚𝑑\mathbb{R}^{m\times d}blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d end_POSTSUPERSCRIPT such that every 𝐔S𝐔𝑆\mathbf{U}\in Sbold_U ∈ italic_S satisfies 𝐔𝐔=𝐈msuperscript𝐔𝐔topsubscript𝐈𝑚\mathbf{U}\mathbf{U}^{\top}=\mathbf{I}_{m}bold_UU start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and every pair 𝐔,𝐕S𝐔𝐕𝑆\mathbf{U},\mathbf{V}\in Sbold_U , bold_V ∈ italic_S with 𝐔𝐕𝐔𝐕\mathbf{U}\neq\mathbf{V}bold_U ≠ bold_V satisfies 𝐔𝐕FO(d1/10)subscriptnormsuperscript𝐔𝐕topF𝑂superscript𝑑110\|\mathbf{U}\mathbf{V}^{\top}\|_{\textnormal{F}}\leq O(d^{-1/10})∥ bold_UV start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT F end_POSTSUBSCRIPT ≤ italic_O ( italic_d start_POSTSUPERSCRIPT - 1 / 10 end_POSTSUPERSCRIPT ).

Let 𝒟D:={PA,𝐕}𝐕Sassignsubscript𝒟𝐷subscriptsubscript𝑃𝐴𝐕𝐕𝑆\mathcal{D}_{D}:=\{P_{A,\mathbf{V}}\}_{\mathbf{V}\in S}caligraphic_D start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT := { italic_P start_POSTSUBSCRIPT italic_A , bold_V end_POSTSUBSCRIPT } start_POSTSUBSCRIPT bold_V ∈ italic_S end_POSTSUBSCRIPT for the distribution A𝐴Aitalic_A.

Using Fact A.10 and Lemma A.8, we have that for any distinct 𝐕,𝐔S𝐕𝐔𝑆\mathbf{V},\mathbf{U}\in Sbold_V , bold_U ∈ italic_S

|χ𝒩(𝟎,𝐈d)(PA,𝐔,PA,𝐕)|subscript𝜒𝒩0subscript𝐈𝑑subscript𝑃𝐴𝐔subscript𝑃𝐴𝐕\displaystyle|\chi_{\mathcal{N}(\mathbf{0},\mathbf{I}_{d})}(P_{A,\mathbf{U}},P% _{A,\mathbf{V}})|| italic_χ start_POSTSUBSCRIPT caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_A , bold_U end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_A , bold_V end_POSTSUBSCRIPT ) | 𝐔𝐕opk+1χ2(A,𝒩(𝟎,𝐈2))Ω(d)(k+1)/10χ2(A,𝒩(𝟎,𝐈d)),absentsuperscriptsubscriptnormsuperscript𝐔𝐕topop𝑘1superscript𝜒2𝐴𝒩0subscript𝐈2Ωsuperscript𝑑𝑘110superscript𝜒2𝐴𝒩0subscript𝐈𝑑\displaystyle\leq\left\|\mathbf{U}\mathbf{V}^{\top}\right\|_{\textnormal{op}}^% {k+1}\chi^{2}(A,\mathcal{N}(\mathbf{0},\mathbf{I}_{2}))\leq\Omega(d)^{-(k+1)/1% 0}\chi^{2}(A,\mathcal{N}(\mathbf{0},\mathbf{I}_{d}))\;,≤ ∥ bold_UV start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT op end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ≤ roman_Ω ( italic_d ) start_POSTSUPERSCRIPT - ( italic_k + 1 ) / 10 end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ) , (25)

where we used that 𝐀op𝐀Fsubscriptnorm𝐀opsubscriptnorm𝐀F\|\mathbf{A}\|_{\textnormal{op}}\leq\|\mathbf{A}\|_{\textnormal{F}}∥ bold_A ∥ start_POSTSUBSCRIPT op end_POSTSUBSCRIPT ≤ ∥ bold_A ∥ start_POSTSUBSCRIPT F end_POSTSUBSCRIPT for any matrix 𝐀𝐀\mathbf{A}bold_A. On the other hand, when 𝐕=𝐔𝐕𝐔\mathbf{V}=\mathbf{U}bold_V = bold_U, we have that |χ𝒩(𝟎,𝐈d)(PA,𝐔,PA,𝐕)|χ2(A,𝒩(𝟎,𝐈d))subscript𝜒𝒩0subscript𝐈𝑑subscript𝑃𝐴𝐔subscript𝑃𝐴𝐕superscript𝜒2𝐴𝒩0subscript𝐈𝑑|\chi_{\mathcal{N}(\mathbf{0},\mathbf{I}_{d})}(P_{A,\mathbf{U}},P_{A,\mathbf{V% }})|\leq\chi^{2}(A,\mathcal{N}(\mathbf{0},\mathbf{I}_{d}))| italic_χ start_POSTSUBSCRIPT caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_A , bold_U end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_A , bold_V end_POSTSUBSCRIPT ) | ≤ italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ). Thus, the family 𝒟Dsubscript𝒟𝐷\mathcal{D}_{D}caligraphic_D start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT is (γ,β)𝛾𝛽(\gamma,\beta)( italic_γ , italic_β )-correlated with γ=Ω(d)(k+1)/10χ2(A,𝒩(𝟎,𝐈d))𝛾Ωsuperscript𝑑𝑘110superscript𝜒2𝐴𝒩0subscript𝐈𝑑\gamma=\Omega(d)^{-(k+1)/10}\chi^{2}(A,\mathcal{N}(\mathbf{0},\mathbf{I}_{d}))italic_γ = roman_Ω ( italic_d ) start_POSTSUPERSCRIPT - ( italic_k + 1 ) / 10 end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ) and β=χ2(A,𝒩(𝟎,𝐈d))𝛽superscript𝜒2𝐴𝒩0subscript𝐈𝑑\beta=\chi^{2}(A,\mathcal{N}(\mathbf{0},\mathbf{I}_{d}))italic_β = italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ) with respect to D=𝒩(𝟎,𝐈2)𝐷𝒩0subscript𝐈2D=\mathcal{N}(\mathbf{0},\mathbf{I}_{2})italic_D = caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). This means that SD((𝒟,D),γ,β)exp(dΩ(1))SD𝒟𝐷𝛾𝛽superscript𝑑Ω1\mathrm{SD}(\mathcal{B}(\mathcal{D},D),\gamma,\beta)\geq\exp({d^{\Omega(1)}})roman_SD ( caligraphic_B ( caligraphic_D , italic_D ) , italic_γ , italic_β ) ≥ roman_exp ( italic_d start_POSTSUPERSCRIPT roman_Ω ( 1 ) end_POSTSUPERSCRIPT ). Therefore, by applying Lemma A.7 with γ:=γ=Ω(d)(k+1)/10χ2(A,𝒩(𝟎,𝐈d))assignsuperscript𝛾𝛾Ωsuperscript𝑑𝑘110superscript𝜒2𝐴𝒩0subscript𝐈𝑑\gamma^{\prime}:=\gamma=\Omega(d)^{-(k+1)/10}\chi^{2}(A,\mathcal{N}(\mathbf{0}% ,\mathbf{I}_{d}))italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := italic_γ = roman_Ω ( italic_d ) start_POSTSUPERSCRIPT - ( italic_k + 1 ) / 10 end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ), we obtain that any SQ algorithm for 𝒵𝒵\mathcal{Z}caligraphic_Z requires at least exp(dΩ(1))dO(k)=superscript𝑑Ω1superscript𝑑𝑂𝑘absent\exp(d^{\Omega(1)})d^{-O(k)}=roman_exp ( italic_d start_POSTSUPERSCRIPT roman_Ω ( 1 ) end_POSTSUPERSCRIPT ) italic_d start_POSTSUPERSCRIPT - italic_O ( italic_k ) end_POSTSUPERSCRIPT = calls to

STAT(Ω(d)Ω(k)χ2(A,𝒩(𝟎,𝐈d))).STATΩsuperscript𝑑Ω𝑘superscript𝜒2𝐴𝒩0subscript𝐈𝑑\displaystyle\mathrm{STAT}\left(\Omega(d)^{-\Omega(k)}\chi^{2}(A,\mathcal{N}(% \mathbf{0},\mathbf{I}_{d}))\right)\;.roman_STAT ( roman_Ω ( italic_d ) start_POSTSUPERSCRIPT - roman_Ω ( italic_k ) end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ) ) .

Appendix B Omitted Proofs from Section 4

B.1 Proof of Proposition 4.3

We restate and prove the following: See 4.3

Proof.

Note that, we can always transform the function gη:{a,b}:subscript𝑔𝜂maps-to𝑎𝑏g_{\eta}:\mathbb{R}\mapsto\{a,b\}italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT : blackboard_R ↦ { italic_a , italic_b } to a gη:{±1}:superscriptsubscript𝑔𝜂maps-toplus-or-minus1g_{\eta}^{\prime}:\mathbb{R}\mapsto\{\pm 1\}italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : blackboard_R ↦ { ± 1 } that satisfies similar properties. We define gη(z)=def(2gη(z)ab)/(ba)superscriptdefsuperscriptsubscript𝑔𝜂𝑧2subscript𝑔𝜂𝑧𝑎𝑏𝑏𝑎g_{\eta}^{\prime}(z)\stackrel{{\scriptstyle{\mathrm{\footnotesize def}}}}{{=}}% (2g_{\eta}(z)-a-b)/(b-a)italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_z ) start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG roman_def end_ARG end_RELOP ( 2 italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) - italic_a - italic_b ) / ( italic_b - italic_a ) and let νt=2νt/(ba)+(a+b)/(ba)𝐄zD[zt]superscriptsubscript𝜈𝑡2subscript𝜈𝑡𝑏𝑎𝑎𝑏𝑏𝑎subscript𝐄similar-to𝑧𝐷superscript𝑧𝑡\nu_{t}^{\prime}=2\nu_{t}/(b-a)+(a+b)/(b-a)\operatorname*{\mathbf{E}}_{z\sim D% }[z^{t}]italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 2 italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT / ( italic_b - italic_a ) + ( italic_a + italic_b ) / ( italic_b - italic_a ) bold_E start_POSTSUBSCRIPT italic_z ∼ italic_D end_POSTSUBSCRIPT [ italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] and η=η(2/(ba))superscript𝜂𝜂2𝑏𝑎\eta^{\prime}=\eta(2/(b-a))italic_η start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_η ( 2 / ( italic_b - italic_a ) ). Hence, we have that for any η>0superscript𝜂0\eta^{\prime}>0italic_η start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > 0, there exists an at most \ellroman_ℓ-piecewise constant function gη:{±1}:subscriptsuperscript𝑔superscript𝜂plus-or-minus1g^{\prime}_{\eta^{\prime}}:\mathbb{R}\to\{\pm 1\}italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT : blackboard_R → { ± 1 } such that |𝐄zD[gη(z)zt]νt|ηsubscript𝐄similar-to𝑧𝐷subscriptsuperscript𝑔𝜂𝑧superscript𝑧𝑡superscriptsubscript𝜈𝑡superscript𝜂|\operatorname*{\mathbf{E}}_{z\sim D}[g^{\prime}_{\eta}(z)z^{t}]-\nu_{t}^{% \prime}|\leq\eta^{\prime}| bold_E start_POSTSUBSCRIPT italic_z ∼ italic_D end_POSTSUBSCRIPT [ italic_g start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] - italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ≤ italic_η start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for every non-negative integer t<k𝑡𝑘t<kitalic_t < italic_k. By applying Lemma B.2 and Lemma B.1, we obtain that there exists an at most (k+1)𝑘1(k+1)( italic_k + 1 )-piecewise constant function f:{±1}:superscript𝑓plus-or-minus1f^{\prime}:\mathbb{R}\to\{\pm 1\}italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : blackboard_R → { ± 1 } such that 𝐄zD[f(z)zt]=νtsubscript𝐄similar-to𝑧𝐷superscript𝑓𝑧superscript𝑧𝑡subscriptsuperscript𝜈𝑡\operatorname*{\mathbf{E}}_{z\sim D}[f^{\prime}(z)z^{t}]=\nu^{\prime}_{t}bold_E start_POSTSUBSCRIPT italic_z ∼ italic_D end_POSTSUBSCRIPT [ italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = italic_ν start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, for every non-negative integer t<k𝑡𝑘t<kitalic_t < italic_k. By setting f(z)=(f(z)(ba)+a+b)/2𝑓𝑧superscript𝑓𝑧𝑏𝑎𝑎𝑏2f(z)=(f^{\prime}(z)(b-a)+a+b)/2italic_f ( italic_z ) = ( italic_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_z ) ( italic_b - italic_a ) + italic_a + italic_b ) / 2, we complete the proof of Proposition 4.3. ∎

Lemma B.1.

Let k𝑘kitalic_k be a positive integer. Let D𝐷Ditalic_D be a continuous distribution over \mathbb{R}blackboard_R and let ν0,,νk1subscript𝜈0normal-…subscript𝜈𝑘1\nu_{0},\ldots,\nu_{k-1}\in\mathbb{R}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_ν start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∈ blackboard_R. If for any η>0𝜂0\eta>0italic_η > 0 there exists an at most (k+1)𝑘1(k+1)( italic_k + 1 )-piecewise constant function gη:{±1}normal-:subscript𝑔𝜂normal-→plus-or-minus1g_{\eta}:\mathbb{R}\to\{\pm 1\}italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT : blackboard_R → { ± 1 } such that |𝐄zD[gη(z)zt]νt|ηsubscript𝐄similar-to𝑧𝐷subscript𝑔𝜂𝑧superscript𝑧𝑡subscript𝜈𝑡𝜂|\operatorname*{\mathbf{E}}_{z\sim D}[g_{\eta}(z)z^{t}]-\nu_{t}|\leq\eta| bold_E start_POSTSUBSCRIPT italic_z ∼ italic_D end_POSTSUBSCRIPT [ italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] - italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ≤ italic_η for every non-negative integer t<k𝑡𝑘t<kitalic_t < italic_k, then there exists an at most (k+1)𝑘1(k+1)( italic_k + 1 )-piecewise constant function f:{±1}normal-:𝑓normal-→plus-or-minus1f:\mathbb{R}\to\{\pm 1\}italic_f : blackboard_R → { ± 1 } such that 𝐄zD[f(z)zt]=νtsubscript𝐄similar-to𝑧𝐷𝑓𝑧superscript𝑧𝑡subscript𝜈𝑡\operatorname*{\mathbf{E}}_{z\sim D}[f(z)z^{t}]=\nu_{t}bold_E start_POSTSUBSCRIPT italic_z ∼ italic_D end_POSTSUBSCRIPT [ italic_f ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, for every non-negative integer t<k𝑡𝑘t<kitalic_t < italic_k.

Lemma B.1 follows from the above using a compactness argument.

Proof.

Let p(z)𝑝𝑧p(z)italic_p ( italic_z ) be the pdf of D𝐷Ditalic_D. For every η>0𝜂0\eta>0italic_η > 0, we have that there exists a function gηsubscript𝑔𝜂g_{\eta}italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT such that |𝐄zD[fη(z)zt]νt|ηsubscript𝐄similar-to𝑧𝐷subscript𝑓𝜂𝑧superscript𝑧𝑡subscript𝜈𝑡𝜂|\operatorname*{\mathbf{E}}_{z\sim D}[f_{\eta}(z)z^{t}]-\nu_{t}|\leq\eta| bold_E start_POSTSUBSCRIPT italic_z ∼ italic_D end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] - italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ≤ italic_η, for every non-negative integer t<k𝑡𝑘t<kitalic_t < italic_k and the function gηsubscript𝑔𝜂g_{\eta}italic_g start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT is at most (k+1)𝑘1(k+1)( italic_k + 1 )-piecewise constant. Let 𝐌:¯kk:𝐌maps-tosuperscript¯𝑘superscript𝑘\mathbf{M}:\mathbb{\overline{R}}^{k}\mapsto\mathbb{R}^{k}bold_M : over¯ start_ARG blackboard_R end_ARG start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ↦ blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, where Mi(𝐛)=n=0k(1)n+1bnbn+1zip(z)dzsubscript𝑀𝑖𝐛superscriptsubscript𝑛0𝑘superscript1𝑛1superscriptsubscriptsubscript𝑏𝑛subscript𝑏𝑛1superscript𝑧𝑖𝑝𝑧differential-d𝑧M_{i}(\mathbf{b})=\sum_{n=0}^{k}(-1)^{n+1}\int_{b_{n}}^{b_{n+1}}z^{i}p(z)% \mathrm{d}zitalic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_b ) = ∑ start_POSTSUBSCRIPT italic_n = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_p ( italic_z ) roman_d italic_z and b1b2bksubscript𝑏1subscript𝑏2subscript𝑏𝑘b_{1}\leq b_{2}\leq\ldots\leq b_{k}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ … ≤ italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, b0=subscript𝑏0b_{0}=-\inftyitalic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = - ∞ and bk+1=subscript𝑏𝑘1b_{k+1}=\inftyitalic_b start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = ∞. Here we assume without loss of generality that before the first breakpoint the function is negative because we can always set the first breakpoint to be -\infty- ∞. It is clear that the function 𝐌𝐌\mathbf{M}bold_M is a continuous map and ¯k+1superscript¯𝑘1\mathbb{\overline{R}}^{k+1}over¯ start_ARG blackboard_R end_ARG start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT is a compact set, thus 𝐌(¯k+1)𝐌superscript¯𝑘1\mathbf{M}\left(\mathbb{\overline{R}}^{k+1}\right)bold_M ( over¯ start_ARG blackboard_R end_ARG start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) is a compact set. We also have that for every η>0𝜂0\eta>0italic_η > 0 there is a point 𝐛¯k+1𝐛superscript¯𝑘1\mathbf{b}\in\mathbb{\overline{R}}^{k+1}bold_b ∈ over¯ start_ARG blackboard_R end_ARG start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT such that |𝐌(𝐛)𝐞iνi|η𝐌𝐛subscript𝐞𝑖subscript𝜈𝑖𝜂|\mathbf{M}(\mathbf{b})\cdot\mathbf{e}_{i}-\nu_{i}|\leq\eta| bold_M ( bold_b ) ⋅ bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ italic_η for all i<k𝑖𝑘i<kitalic_i < italic_k. Thus, from compactness, we have that there exists a point 𝐛*¯k+1superscript𝐛superscript¯𝑘1\mathbf{b}^{*}\in\mathbb{\overline{R}}^{k+1}bold_b start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ over¯ start_ARG blackboard_R end_ARG start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT such that 𝐌(𝐛*)=𝟎𝐌superscript𝐛0\mathbf{M}(\mathbf{b}^{*})=\mathbf{0}bold_M ( bold_b start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) = bold_0. This completes the proof. ∎

The following lemma is similar with the main lemma of [DKZ20], we provide the proof for completeness as in our case the distributions are more general and we want specific values for their moments.

Lemma B.2.

Let m𝑚mitalic_m and k𝑘kitalic_k be positive integers such that m>k+1𝑚𝑘1m>k+1italic_m > italic_k + 1 and η>0𝜂0\eta>0italic_η > 0. Let D𝐷Ditalic_D be a continuous distribution over \mathbb{R}blackboard_R and let ν0,,νk1subscript𝜈0normal-…subscript𝜈𝑘1\nu_{0},\ldots,\nu_{k-1}\in\mathbb{R}italic_ν start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_ν start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ∈ blackboard_R. If there exists an m𝑚mitalic_m-piecewise constant f:{±1}normal-:𝑓maps-toplus-or-minus1f:\mathbb{R}\mapsto\{\pm 1\}italic_f : blackboard_R ↦ { ± 1 } such that |𝐄zD[f(z)zt]νt|<ηsubscript𝐄similar-to𝑧𝐷𝑓𝑧superscript𝑧𝑡subscript𝜈𝑡𝜂|\operatorname*{\mathbf{E}}_{z\sim D}[f(z)z^{t}]-\nu_{t}|<\eta| bold_E start_POSTSUBSCRIPT italic_z ∼ italic_D end_POSTSUBSCRIPT [ italic_f ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] - italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | < italic_η for all non-negative integers t<k𝑡𝑘t<kitalic_t < italic_k, then there exists an at most (m1)𝑚1(m-1)( italic_m - 1 )-piecewise constant g:{±1}normal-:𝑔maps-toplus-or-minus1g:\mathbb{R}\mapsto\{\pm 1\}italic_g : blackboard_R ↦ { ± 1 } such that |𝐄zD[g(z)zt]νt|<ηsubscript𝐄similar-to𝑧𝐷𝑔𝑧superscript𝑧𝑡subscript𝜈𝑡𝜂|\operatorname*{\mathbf{E}}_{z\sim D}[g(z)z^{t}]-\nu_{t}|<\eta| bold_E start_POSTSUBSCRIPT italic_z ∼ italic_D end_POSTSUBSCRIPT [ italic_g ( italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] - italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | < italic_η for all non-negative integers t<k𝑡𝑘t<kitalic_t < italic_k.

Proof.

Let p(z)𝑝𝑧p(z)italic_p ( italic_z ) be the pdf of D𝐷Ditalic_D. Let {b1,b2,,bm1}subscript𝑏1subscript𝑏2subscript𝑏𝑚1\{b_{1},b_{2},\ldots,b_{m-1}\}{ italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT } be the breakpoints of f𝑓fitalic_f, i.e., the points where the function f𝑓fitalic_f changes value. Then let F(z1,z2,,zm1,z):¯m:𝐹subscript𝑧1subscript𝑧2subscript𝑧𝑚1𝑧maps-tosuperscript¯𝑚F(z_{1},z_{2},\ldots,z_{m-1},z):\mathbb{\overline{R}}^{m}\mapsto\mathbb{R}italic_F ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT , italic_z ) : over¯ start_ARG blackboard_R end_ARG start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ↦ blackboard_R be an m𝑚mitalic_m-piecewise constant function with breakpoints on z1,,zm1subscript𝑧1subscript𝑧𝑚1z_{1},\ldots,z_{m-1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT, where z1<z2<<zm1subscript𝑧1subscript𝑧2subscript𝑧𝑚1z_{1}<z_{2}<\ldots<z_{m-1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < … < italic_z start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT and F(b1,b2,,bm1,z)=f(z)𝐹subscript𝑏1subscript𝑏2subscript𝑏𝑚1𝑧𝑓𝑧F(b_{1},b_{2},\ldots,b_{m-1},z)=f(z)italic_F ( italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT , italic_z ) = italic_f ( italic_z ). For simplicity, let 𝐳=(z1,,zm1)𝐳subscript𝑧1subscript𝑧𝑚1\mathbf{z}=(z_{1},\ldots,z_{m-1})bold_z = ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT ) and define Mi(𝐳)=𝐄zD[F(𝐳,z)zi]subscript𝑀𝑖𝐳subscript𝐄similar-to𝑧𝐷𝐹𝐳𝑧superscript𝑧𝑖M_{i}(\mathbf{z})=\operatorname*{\mathbf{E}}_{z\sim D}[F(\mathbf{z},z)z^{i}]italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z ) = bold_E start_POSTSUBSCRIPT italic_z ∼ italic_D end_POSTSUBSCRIPT [ italic_F ( bold_z , italic_z ) italic_z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ] and let 𝐌(𝐳)=[M0(𝐳),M1(𝐳),Mk1(𝐳)]T𝐌𝐳superscriptsubscript𝑀0𝐳subscript𝑀1𝐳subscript𝑀𝑘1𝐳𝑇\mathbf{M}(\mathbf{z})=[M_{0}(\mathbf{z}),M_{1}(\mathbf{z}),\ldots M_{k-1}(% \mathbf{z})]^{T}bold_M ( bold_z ) = [ italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_z ) , italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_z ) , … italic_M start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ( bold_z ) ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. It is clear from the definition that Mi(𝐳)=n=0m1znzn+1F(𝐳,z)zip(z)dz=n=0m1anznzn+1zip(z)dzsubscript𝑀𝑖𝐳superscriptsubscript𝑛0𝑚1superscriptsubscriptsubscript𝑧𝑛subscript𝑧𝑛1𝐹𝐳𝑧superscript𝑧𝑖𝑝𝑧differential-d𝑧superscriptsubscript𝑛0𝑚1subscript𝑎𝑛superscriptsubscriptsubscript𝑧𝑛subscript𝑧𝑛1superscript𝑧𝑖𝑝𝑧differential-d𝑧M_{i}(\mathbf{z})=\sum_{n=0}^{m-1}\int_{z_{n}}^{z_{n+1}}F(\mathbf{z},z)z^{i}p(% z)\mathrm{d}z=\sum_{n=0}^{m-1}a_{n}\int_{z_{n}}^{z_{n+1}}z^{i}p(z)\mathrm{d}zitalic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z ) = ∑ start_POSTSUBSCRIPT italic_n = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_F ( bold_z , italic_z ) italic_z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_p ( italic_z ) roman_d italic_z = ∑ start_POSTSUBSCRIPT italic_n = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_p ( italic_z ) roman_d italic_z, where z0=subscript𝑧0z_{0}=-\inftyitalic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = - ∞ and zm=subscript𝑧𝑚z_{m}=\inftyitalic_z start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = ∞ and ansubscript𝑎𝑛a_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the sign of F(𝐳,z)𝐹𝐳𝑧F(\mathbf{z},z)italic_F ( bold_z , italic_z ) in the interval (zn,zn+1)subscript𝑧𝑛subscript𝑧𝑛1(z_{n},z_{n+1})( italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ). Note that an=an+1subscript𝑎𝑛subscript𝑎𝑛1a_{n}=-a_{n+1}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = - italic_a start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT for every 0n<m0𝑛𝑚0\leq n<m0 ≤ italic_n < italic_m. By taking the derivative of Misubscript𝑀𝑖M_{i}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in zjsubscript𝑧𝑗z_{j}italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, for 0<j<m0𝑗𝑚0<j<m0 < italic_j < italic_m, we get that

zjMi(𝐳)=2aj1zjip(zj)andzj𝐌(𝐳)=2aj1p(zj)[1,zj1,,zjk1]T.formulae-sequencesubscript𝑧𝑗subscript𝑀𝑖𝐳2subscript𝑎𝑗1superscriptsubscript𝑧𝑗𝑖𝑝subscript𝑧𝑗andsubscript𝑧𝑗𝐌𝐳2subscript𝑎𝑗1𝑝subscript𝑧𝑗superscript1superscriptsubscript𝑧𝑗1superscriptsubscript𝑧𝑗𝑘1𝑇\frac{\partial}{\partial z_{j}}M_{i}(\mathbf{z})=2a_{j-1}z_{j}^{i}p(z_{j})% \quad\text{and}\quad\frac{\partial}{\partial z_{j}}\mathbf{M}(\mathbf{z})=2a_{% j-1}p(z_{j})[1,z_{j}^{1},\ldots,z_{j}^{k-1}]^{T}\;.divide start_ARG ∂ end_ARG start_ARG ∂ italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z ) = 2 italic_a start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_p ( italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) and divide start_ARG ∂ end_ARG start_ARG ∂ italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG bold_M ( bold_z ) = 2 italic_a start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT italic_p ( italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) [ 1 , italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT .

We now argue that for any 𝐳𝐳\mathbf{z}bold_z with distinct coordinates that there exists a vector 𝐮m1𝐮superscript𝑚1\mathbf{u}\in\mathbb{R}^{m-1}bold_u ∈ blackboard_R start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT such that 𝐮=(𝐮1,,𝐮k,0,0,,0,1)𝐮subscript𝐮1subscript𝐮𝑘0001\mathbf{u}=(\mathbf{u}_{1},\ldots,\mathbf{u}_{k},0,0,\ldots,0,1)bold_u = ( bold_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , 0 , 0 , … , 0 , 1 ) and the directional derivative of 𝐌𝐌\mathbf{M}bold_M in the 𝐮𝐮\mathbf{u}bold_u direction is zero. To prove this, we construct a system of linear equations such that 𝐮Mi(𝐳)=0subscript𝐮subscript𝑀𝑖𝐳0\nabla_{\mathbf{u}}M_{i}(\mathbf{z})=0∇ start_POSTSUBSCRIPT bold_u end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z ) = 0, for all 0i<k0𝑖𝑘0\leq i<k0 ≤ italic_i < italic_k. Indeed, we have j=1kzjMi(𝐳)𝐮j=zm1Mi(𝐳)superscriptsubscript𝑗1𝑘subscript𝑧𝑗subscript𝑀𝑖𝐳subscript𝐮𝑗subscript𝑧𝑚1subscript𝑀𝑖𝐳\sum_{j=1}^{k}\frac{\partial}{\partial z_{j}}M_{i}(\mathbf{z})\mathbf{u}_{j}=-% \frac{\partial}{\partial z_{m-1}}M_{i}(\mathbf{z})∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT divide start_ARG ∂ end_ARG start_ARG ∂ italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z ) bold_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = - divide start_ARG ∂ end_ARG start_ARG ∂ italic_z start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT end_ARG italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_z ) or j=1kaj1zjip(zj)𝐮j=am2zm1ip(zm1)superscriptsubscript𝑗1𝑘subscript𝑎𝑗1superscriptsubscript𝑧𝑗𝑖𝑝subscript𝑧𝑗subscript𝐮𝑗subscript𝑎𝑚2superscriptsubscript𝑧𝑚1𝑖𝑝subscript𝑧𝑚1\sum_{j=1}^{k}a_{j-1}z_{j}^{i}p(z_{j})\mathbf{u}_{j}=-a_{m-2}z_{m-1}^{i}p(z_{m% -1})∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_p ( italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) bold_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = - italic_a start_POSTSUBSCRIPT italic_m - 2 end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_p ( italic_z start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT ), which is linear in the variables 𝐮jsubscript𝐮𝑗\mathbf{u}_{j}bold_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Let 𝐮^^𝐮\hat{\mathbf{u}}over^ start_ARG bold_u end_ARG be the vector with the first k𝑘kitalic_k variables and let 𝐰𝐰\mathbf{w}bold_w be the vector of the right hand side of the system, i.e., 𝐰i=am2zm1ip(zm1)subscript𝐰𝑖subscript𝑎𝑚2superscriptsubscript𝑧𝑚1𝑖𝑝subscript𝑧𝑚1\mathbf{w}_{i}=-a_{m-2}z_{m-1}^{i}p(z_{m-1})bold_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = - italic_a start_POSTSUBSCRIPT italic_m - 2 end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_p ( italic_z start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT ). Then this system can be written in matrix form as 𝐕𝐃𝐮^=𝐰𝐕𝐃^𝐮𝐰\mathbf{V}\mathbf{D}\hat{\mathbf{u}}=\mathbf{w}bold_VD over^ start_ARG bold_u end_ARG = bold_w, where 𝐕𝐕\mathbf{V}bold_V is the Vandermonde matrix, i.e., the matrix that is 𝐕i,j=αij1subscript𝐕𝑖𝑗superscriptsubscript𝛼𝑖𝑗1\mathbf{V}_{i,j}=\alpha_{i}^{j-1}bold_V start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT, for some values αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝐃𝐃\mathbf{D}bold_D is a diagonal matrix. In our case, 𝐕i,j=zij1subscript𝐕𝑖𝑗superscriptsubscript𝑧𝑖𝑗1\mathbf{V}_{i,j}=z_{i}^{j-1}bold_V start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT and 𝐃j,j=2aj1p(zj)subscript𝐃𝑗𝑗2subscript𝑎𝑗1𝑝subscript𝑧𝑗\mathbf{D}_{j,j}=2a_{j-1}p(z_{j})bold_D start_POSTSUBSCRIPT italic_j , italic_j end_POSTSUBSCRIPT = 2 italic_a start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT italic_p ( italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). It is known that the Vandermonde matrix has full rank iff for all ij𝑖𝑗i\neq jitalic_i ≠ italic_j we have αiαjsubscript𝛼𝑖subscript𝛼𝑗\alpha_{i}\neq\alpha_{j}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, which holds in our setting. Thus, the matrix 𝐕𝐃𝐕𝐃\mathbf{V}\mathbf{D}bold_VD is nonsingular and there exists a solution to the equation. Thus, there exists a vector 𝐮𝐮\mathbf{u}bold_u with our desired properties and, moreover, any vector in this direction is a solution of this system of linear equations. Note that the vector 𝐮𝐮\mathbf{u}bold_u depends on the value of 𝐳𝐳\mathbf{z}bold_z, thus we consider 𝐮(𝐳)𝐮𝐳\mathbf{u}(\mathbf{z})bold_u ( bold_z ) be the (continuous) function that returns a vector 𝐮𝐮\mathbf{u}bold_u given 𝐳𝐳\mathbf{z}bold_z. We define a differential equation for the function 𝐯:¯¯m1:𝐯maps-to¯superscript¯𝑚1\mathbf{v}:\mathbb{\overline{R}}\mapsto\mathbb{\overline{R}}^{m-1}bold_v : over¯ start_ARG blackboard_R end_ARG ↦ over¯ start_ARG blackboard_R end_ARG start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT, as follows: 𝐯(0)=𝐛𝐯0𝐛\mathbf{v}(0)=\mathbf{b}bold_v ( 0 ) = bold_b, where 𝐛=(b1,,bm1)𝐛subscript𝑏1subscript𝑏𝑚1\mathbf{b}=(b_{1},\ldots,b_{m-1})bold_b = ( italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT ), and 𝐯(T)=𝐮(𝐯(T))superscript𝐯𝑇𝐮𝐯𝑇\mathbf{v}^{\prime}(T)=\mathbf{u}(\mathbf{v}(T))bold_v start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_T ) = bold_u ( bold_v ( italic_T ) ) for all T¯𝑇¯T\in\mathbb{\overline{R}}italic_T ∈ over¯ start_ARG blackboard_R end_ARG. If 𝐯𝐯\mathbf{v}bold_v is a solution to this differential equation, then we have:

ddT𝐌(𝐯(T))=dd𝐯(T)𝐌(𝐯(T))ddT𝐯(T)=dd𝐯(T)𝐌(𝐯(T))𝐮(𝐯(T))=𝟎,dd𝑇𝐌𝐯𝑇dd𝐯𝑇𝐌𝐯𝑇dd𝑇𝐯𝑇dd𝐯𝑇𝐌𝐯𝑇𝐮𝐯𝑇0\frac{\mathrm{d}}{\mathrm{d}T}\mathbf{M}(\mathbf{v}(T))=\frac{\mathrm{d}}{% \mathrm{d}\mathbf{v}(T)}\mathbf{M}(\mathbf{v}(T))\frac{\mathrm{d}}{\mathrm{d}T% }\mathbf{v}(T)=\frac{\mathrm{d}}{\mathrm{d}\mathbf{v}(T)}\mathbf{M}(\mathbf{v}% (T))\mathbf{u}(\mathbf{v}(T))=\mathbf{0}\;,divide start_ARG roman_d end_ARG start_ARG roman_d italic_T end_ARG bold_M ( bold_v ( italic_T ) ) = divide start_ARG roman_d end_ARG start_ARG roman_d bold_v ( italic_T ) end_ARG bold_M ( bold_v ( italic_T ) ) divide start_ARG roman_d end_ARG start_ARG roman_d italic_T end_ARG bold_v ( italic_T ) = divide start_ARG roman_d end_ARG start_ARG roman_d bold_v ( italic_T ) end_ARG bold_M ( bold_v ( italic_T ) ) bold_u ( bold_v ( italic_T ) ) = bold_0 ,

where we used the chain rule and that the directional derivative in 𝐮(𝐯(T))𝐮𝐯𝑇\mathbf{u}(\mathbf{v}(T))bold_u ( bold_v ( italic_T ) ) direction is zero. This means that the function 𝐌(𝐯(t))𝐌𝐯𝑡\mathbf{M}(\mathbf{v}(t))bold_M ( bold_v ( italic_t ) ) is constant, and for all 0j<k0𝑗𝑘0\leq j<k0 ≤ italic_j < italic_k, we have |Mjνj|<ηsubscript𝑀𝑗subscript𝜈𝑗𝜂|M_{j}-\nu_{j}|<\eta| italic_M start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_ν start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | < italic_η, because we have that |𝐄zD[F(z1,,zm1,z)zt]νt|<ηsubscript𝐄similar-to𝑧𝐷𝐹subscript𝑧1subscript𝑧𝑚1𝑧superscript𝑧𝑡subscript𝜈𝑡𝜂|\operatorname*{\mathbf{E}}_{z\sim D}[F(z_{1},\ldots,z_{m-1},z)z^{t}]-\nu_{t}|<\eta| bold_E start_POSTSUBSCRIPT italic_z ∼ italic_D end_POSTSUBSCRIPT [ italic_F ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT , italic_z ) italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] - italic_ν start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | < italic_η. Furthermore, since 𝐮(𝐯(T))𝐮𝐯𝑇\mathbf{u}(\mathbf{v}(T))bold_u ( bold_v ( italic_T ) ) is continuous in 𝐯(T)𝐯𝑇\mathbf{v}(T)bold_v ( italic_T ), this differential equation will be well founded and have a solution up until the point where either two of the zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT approach each other or one of the zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT approaches plus or minus infinity (the solution cannot oscillate, since 𝐯m1(T)=1superscriptsubscript𝐯𝑚1𝑇1\mathbf{v}_{m-1}^{\prime}(T)=1bold_v start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_T ) = 1 for all T𝑇Titalic_T). Running the differential equation until we reach such a limit, we find a limiting value 𝐯superscript𝐯\mathbf{v}^{\ast}bold_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT of 𝐯(T)𝐯𝑇\mathbf{v}(T)bold_v ( italic_T ) so that either:

  1. 1.

    There is an i𝑖iitalic_i such that 𝐯i=𝐯i+1superscriptsubscript𝐯𝑖superscriptsubscript𝐯𝑖1\mathbf{v}_{i}^{\ast}=\mathbf{v}_{i+1}^{\ast}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = bold_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, which gives us a function that is at most (m2)𝑚2(m-2)( italic_m - 2 )-piecewise constant, i.e., taking F(𝐯,z)𝐹superscript𝐯𝑧F(\mathbf{v}^{\ast},z)italic_F ( bold_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_z ).

  2. 2.

    Either 𝐯m1=superscriptsubscript𝐯𝑚1\mathbf{v}_{m-1}^{\ast}=\inftybold_v start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ∞ or 𝐯1=superscriptsubscript𝐯1\mathbf{v}_{1}^{\ast}=-\inftybold_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = - ∞, which gives us an at most (m1)𝑚1(m-1)( italic_m - 1 )-piecewise constant function, i.e., taking F(𝐯,z)𝐹superscript𝐯𝑧F(\mathbf{v}^{\ast},z)italic_F ( bold_v start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_z ). Since when the 𝐯m1=superscriptsubscript𝐯𝑚1\mathbf{v}_{m-1}^{\ast}=\inftybold_v start_POSTSUBSCRIPT italic_m - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ∞, the last breakpoint becomes \infty, we have one less breakpoint, and if 𝐯1=superscriptsubscript𝐯1\mathbf{v}_{1}^{\ast}=-\inftybold_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = - ∞ we lose the first breakpoint.

Thus, in either case we have a function with at most m1𝑚1m-1italic_m - 1 breakpoints and the same moments. This completes the proof. ∎

Appendix C Lower Bounds for Low-Degree Polynomial Tests

We describe the implications of SQ lower bounds to low-degree polynomials for the problem below:

Problem C.1.

Let a distribution A𝐴Aitalic_A on msuperscript𝑚\mathbb{R}^{m}blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. For a matrix 𝐕m×d𝐕superscript𝑚𝑑\mathbf{V}\in\mathbb{R}^{m\times d}bold_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d end_POSTSUPERSCRIPT, we let PA,𝐕subscript𝑃𝐴𝐕P_{A,\mathbf{V}}italic_P start_POSTSUBSCRIPT italic_A , bold_V end_POSTSUBSCRIPT be the distribution as in Definition 2.3, i.e., the distribution that coincides with A𝐴Aitalic_A on the subspace spanned by the rows of 𝐕𝐕\mathbf{V}bold_V and is standard Gaussian in the orthogonal subspace. Let S𝑆Sitalic_S be the set of nearly orthogonal vectors from Fact A.10. Let 𝒮={PA,v}uS𝒮subscriptsubscript𝑃𝐴𝑣𝑢𝑆\mathcal{S}=\{P_{A,v}\}_{u\in S}caligraphic_S = { italic_P start_POSTSUBSCRIPT italic_A , italic_v end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_u ∈ italic_S end_POSTSUBSCRIPT. We define the simple hypothesis testing problem where the null hypothesis is 𝒩(𝟎,Id)𝒩0subscript𝐼𝑑\mathcal{N}(\mathbf{0},I_{d})caligraphic_N ( bold_0 , italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) and the alternative hypothesis is PA,𝐕subscript𝑃𝐴𝐕P_{A,\mathbf{V}}italic_P start_POSTSUBSCRIPT italic_A , bold_V end_POSTSUBSCRIPT for some 𝐕𝐕\mathbf{V}bold_V uniformly selected from S𝑆Sitalic_S.

We now describe the model in more detail. We will consider tests that are thresholded polynomials of low-degree, i.e., output H1subscript𝐻1H_{1}italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT if the value of the polynomial exceeds a threshold and H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT otherwise. We need the following notation and definitions. For a distribution D𝐷Ditalic_D over 𝒳𝒳\mathcal{X}caligraphic_X, we use Dnsuperscript𝐷tensor-productabsent𝑛D^{\otimes n}italic_D start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT to denote the joint distribution of n𝑛nitalic_n i.i.d. samples from D𝐷Ditalic_D. For two functions f:𝒳:𝑓𝒳f:\mathcal{X}\to\mathbb{R}italic_f : caligraphic_X → blackboard_R, g:𝒳R:𝑔𝒳𝑅g:\mathcal{X}\to Ritalic_g : caligraphic_X → italic_R and a distribution D𝐷Ditalic_D, we use f,gDsubscript𝑓𝑔𝐷\langle f,g\rangle_{D}⟨ italic_f , italic_g ⟩ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT to denote the inner product 𝐄XD[f(X)g(X)]subscript𝐄similar-to𝑋𝐷𝑓𝑋𝑔𝑋\operatorname*{\mathbf{E}}_{X\sim D}[f(X)g(X)]bold_E start_POSTSUBSCRIPT italic_X ∼ italic_D end_POSTSUBSCRIPT [ italic_f ( italic_X ) italic_g ( italic_X ) ]. We use fDsubscriptnorm𝑓𝐷\|f\|_{D}∥ italic_f ∥ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT to denote f,fDsubscript𝑓𝑓𝐷\sqrt{\langle f,f\rangle_{D}}square-root start_ARG ⟨ italic_f , italic_f ⟩ start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT end_ARG. We say that a polynomial f(x1,,xn):n×d:𝑓subscript𝑥1subscript𝑥𝑛superscript𝑛𝑑f(x_{1},\dots,x_{n}):\mathbb{R}^{n\times d}\to\mathbb{R}italic_f ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) : blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT → blackboard_R has sample-wise degree (r,)𝑟(r,\ell)( italic_r , roman_ℓ ) if each monomial uses at most \ellroman_ℓ different samples from x1,,xnsubscript𝑥1subscript𝑥𝑛x_{1},\dots,x_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and uses degree at most r𝑟ritalic_r for each of them. Let 𝒞r,subscript𝒞𝑟\mathcal{C}_{r,\ell}caligraphic_C start_POSTSUBSCRIPT italic_r , roman_ℓ end_POSTSUBSCRIPT be linear space of all polynomials of sample-wise degree (r,)𝑟(r,\ell)( italic_r , roman_ℓ ) with respect to the inner product defined above. For a function f:n×d:𝑓superscript𝑛𝑑f:\mathbb{R}^{n\times d}\to\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT → blackboard_R, we use fr,superscript𝑓absent𝑟f^{\leq r,\ell}italic_f start_POSTSUPERSCRIPT ≤ italic_r , roman_ℓ end_POSTSUPERSCRIPT to be the orthogonal projection onto 𝒞r,subscript𝒞𝑟\mathcal{C}_{r,\ell}caligraphic_C start_POSTSUBSCRIPT italic_r , roman_ℓ end_POSTSUBSCRIPT with respect to the inner product ,D0nsubscriptsuperscriptsubscript𝐷0tensor-productabsent𝑛\langle\cdot,\cdot\rangle_{D_{0}^{\otimes n}}⟨ ⋅ , ⋅ ⟩ start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. Finally, for the null distribution D0subscript𝐷0D_{0}italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and a distribution P𝑃Pitalic_P, define the likelihood ratio P¯n(x):=Pn(x)/D0n(x)assignsuperscript¯𝑃tensor-productabsent𝑛𝑥superscript𝑃tensor-productabsent𝑛𝑥superscriptsubscript𝐷0tensor-productabsent𝑛𝑥\overline{P}^{\otimes n}(x):={P^{\otimes n}(x)}/{D_{0}^{\otimes n}(x)}over¯ start_ARG italic_P end_ARG start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT ( italic_x ) := italic_P start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT ( italic_x ) / italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT ( italic_x ).

Definition C.2 (n𝑛nitalic_n-sample τ𝜏\tauitalic_τ-distinguisher).

For the hypothesis testing problem between D0subscript𝐷0D_{0}italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (null distribution) and D1subscript𝐷1D_{1}italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (alternate distribution) over 𝒳𝒳\mathcal{X}caligraphic_X, we say that a function p:𝒳nnormal-:𝑝normal-→superscript𝒳𝑛p:\mathcal{X}^{n}\to\mathbb{R}italic_p : caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R is an n𝑛nitalic_n-sample τ𝜏\tauitalic_τ-distinguisher if |𝐄XD0n[p(X)]𝐄XD1n[p(X)]|τ𝐕𝐚𝐫XD0n[p(X)]subscript𝐄similar-to𝑋superscriptsubscript𝐷0tensor-productabsent𝑛𝑝𝑋subscript𝐄similar-to𝑋superscriptsubscript𝐷1tensor-productabsent𝑛𝑝𝑋𝜏subscript𝐕𝐚𝐫similar-to𝑋superscriptsubscript𝐷0tensor-productabsent𝑛𝑝𝑋|\operatorname*{\mathbf{E}}_{X\sim D_{0}^{\otimes n}}[p(X)]-\operatorname*{% \mathbf{E}}_{X\sim D_{1}^{\otimes n}}[p(X)]|\geq\tau\sqrt{\operatorname*{% \mathbf{Var}}_{X\sim D_{0}^{\otimes n}}[p(X)]}| bold_E start_POSTSUBSCRIPT italic_X ∼ italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_p ( italic_X ) ] - bold_E start_POSTSUBSCRIPT italic_X ∼ italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_p ( italic_X ) ] | ≥ italic_τ square-root start_ARG bold_Var start_POSTSUBSCRIPT italic_X ∼ italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_p ( italic_X ) ] end_ARG. We call τ𝜏\tauitalic_τ the advantage of the polynomial p𝑝pitalic_p.

Note that if a function p𝑝pitalic_p has advantage τ𝜏\tauitalic_τ, then the Chebyshev’s inequality implies that one can furnish a test p:𝒳n{D0,D1}:superscript𝑝superscript𝒳𝑛subscript𝐷0subscript𝐷1p^{\prime}:\mathcal{X}^{n}\to\{D_{0},D_{1}\}italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT : caligraphic_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → { italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } by thresholding p𝑝pitalic_p such that the probability of error under the null distribution is at most O(1/τ2)𝑂1superscript𝜏2O(1/\tau^{2})italic_O ( 1 / italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). We will think of the advantage τ𝜏\tauitalic_τ as the proxy for the inverse of the probability of error (see Theorem 4.3 in [KWB22] for a formalization of this intuition under certain assumptions) and we will show that the advantage of all polynomials up to a certain degree is O(1)𝑂1O(1)italic_O ( 1 ). It can be shown that for hypothesis testing problems of the form of Problem C.1, the best possible advantage among all polynomials in 𝒞r,subscript𝒞𝑟\mathcal{C}_{r,\ell}caligraphic_C start_POSTSUBSCRIPT italic_r , roman_ℓ end_POSTSUBSCRIPT is captured by the low-degree likelihood ratio (see, e.g., [BBH+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT21, KWB22]):

𝐄v𝒰(S)[(P¯A,𝐕n)r,]1D0n,subscriptnormsubscript𝐄similar-to𝑣𝒰𝑆superscriptsuperscriptsubscript¯𝑃𝐴𝐕tensor-productabsent𝑛absent𝑟1superscriptsubscript𝐷0tensor-productabsent𝑛\displaystyle\left\|\operatorname*{\mathbf{E}}_{v\sim\mathcal{U}(S)}\left[% \left(\overline{P}_{A,\mathbf{V}}^{\otimes n}\right)^{\leq r,\ell}\right]-1% \right\|_{D_{0}^{\otimes n}},∥ bold_E start_POSTSUBSCRIPT italic_v ∼ caligraphic_U ( italic_S ) end_POSTSUBSCRIPT [ ( over¯ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_A , bold_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ≤ italic_r , roman_ℓ end_POSTSUPERSCRIPT ] - 1 ∥ start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ,

where in our case D0=𝒩(𝟎,𝐈d)subscript𝐷0𝒩0subscript𝐈𝑑D_{0}=\mathcal{N}(\mathbf{0},\mathbf{I}_{d})italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ).

To show that the low-degree likelihood ratio is small, we use the result from [BBH+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT21] stating that a lower bound for the SQ dimension translates to an upper bound for the low-degree likelihood ratio. Therefore, given that we have already established in previous section that SD(({PA,𝐕}𝐕S,𝒩(𝟎,𝐈d)),γ,β)=2dcSDsubscriptsubscript𝑃𝐴𝐕𝐕𝑆𝒩0subscript𝐈𝑑𝛾𝛽superscript2superscript𝑑𝑐\mathrm{SD}(\mathcal{B}(\{P_{A,\mathbf{V}}\}_{\mathbf{V}\in S},\mathcal{N}(% \mathbf{0},\mathbf{I}_{d})),\gamma,\beta)=2^{d^{c}}roman_SD ( caligraphic_B ( { italic_P start_POSTSUBSCRIPT italic_A , bold_V end_POSTSUBSCRIPT } start_POSTSUBSCRIPT bold_V ∈ italic_S end_POSTSUBSCRIPT , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ) , italic_γ , italic_β ) = 2 start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT for γ=Ω(d)(t+1)/10χ2(A,𝒩(𝟎,𝐈d))𝛾Ωsuperscript𝑑𝑡110superscript𝜒2𝐴𝒩0subscript𝐈𝑑\gamma=\Omega(d)^{(t+1)/10}\chi^{2}(A,\mathcal{N}(\mathbf{0},\mathbf{I}_{d}))italic_γ = roman_Ω ( italic_d ) start_POSTSUPERSCRIPT ( italic_t + 1 ) / 10 end_POSTSUPERSCRIPT italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ) and β=χ2(A,𝒩(0,1))𝛽superscript𝜒2𝐴𝒩01\beta=\chi^{2}(A,\mathcal{N}(0,1))italic_β = italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( 0 , 1 ) ), we one can obtain the corollary:

Theorem C.3.

Let a sufficiently small positive constant c𝑐citalic_c. Let the hypothesis testing problem of Problem C.1 the distribution A𝐴Aitalic_A matches the first t𝑡titalic_t moments with 𝒩(𝟎,𝐈m)𝒩0subscript𝐈𝑚\mathcal{N}(\mathbf{0},\mathbf{I}_{m})caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ). For any d+𝑑subscriptd\in\mathbb{Z}_{+}italic_d ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT with d=tΩ(1/c)𝑑superscript𝑡normal-Ω1𝑐d=t^{\Omega(1/c)}italic_d = italic_t start_POSTSUPERSCRIPT roman_Ω ( 1 / italic_c ) end_POSTSUPERSCRIPT, any nΩ(d)(t+1)/10/χ2(A,𝒩(𝟎,𝐈m))𝑛normal-Ωsuperscript𝑑𝑡110superscript𝜒2𝐴𝒩0subscript𝐈𝑚n\leq\Omega(d)^{(t+1)/10}/\chi^{2}(A,\mathcal{N}(\mathbf{0},\mathbf{I}_{m}))italic_n ≤ roman_Ω ( italic_d ) start_POSTSUPERSCRIPT ( italic_t + 1 ) / 10 end_POSTSUPERSCRIPT / italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) and any even integer <dcnormal-ℓsuperscript𝑑𝑐\ell<d^{c}roman_ℓ < italic_d start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT, we have that

𝐄v𝒰(S)[(P¯A,𝐕n),]1D0n1.subscriptnormsubscript𝐄similar-to𝑣𝒰𝑆superscriptsuperscriptsubscript¯𝑃𝐴𝐕tensor-productabsent𝑛absent1superscriptsubscript𝐷0tensor-productabsent𝑛1\displaystyle\left\|\operatorname*{\mathbf{E}}_{v\sim\mathcal{U}(S)}\left[% \left(\overline{P}_{A,\mathbf{V}}^{\otimes n}\right)^{\leq\infty,\ell}\right]-% 1\right\|_{D_{0}^{\otimes n}}\leq 1\;.∥ bold_E start_POSTSUBSCRIPT italic_v ∼ caligraphic_U ( italic_S ) end_POSTSUBSCRIPT [ ( over¯ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_A , bold_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ≤ ∞ , roman_ℓ end_POSTSUPERSCRIPT ] - 1 ∥ start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊗ italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ 1 .

The interpretation of this result is that unless the number of samples used n𝑛nitalic_n is greater than Ω(d)(t+1)/10/χ2(A,𝒩(𝟎,𝐈m))Ωsuperscript𝑑𝑡110superscript𝜒2𝐴𝒩0subscript𝐈𝑚\Omega(d)^{(t+1)/10}/\chi^{2}(A,\mathcal{N}(\mathbf{0},\mathbf{I}_{m}))roman_Ω ( italic_d ) start_POSTSUPERSCRIPT ( italic_t + 1 ) / 10 end_POSTSUPERSCRIPT / italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A , caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ), any polynomial of degree roughly up to dcsuperscript𝑑𝑐d^{c}italic_d start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT fails to be a good test (note that any polynomial of degree \ellroman_ℓ has sample-wise degree at most (,)(\ell,\ell)( roman_ℓ , roman_ℓ )).