Quantum phase estimation by compressed sensing

Changhao Yi State Key Laboratory of Surface Physics and Department of Physics, Fudan University, Shanghai 200433, China Institute for Nanoelectronic Devices and Quantum Computing, Fudan University, Shanghai 200433, China Cunlu Zhou Jun Takahashi Center for Quantum Information and Control, Department of Physics and Astronomy, University of New Mexico, NM 87131, USA

Abstract

As a signal recovery algorithm, compressed sensing is particularly useful when the data has low-complexity and samples are rare, which matches perfectly with the task of quantum phase estimation (QPE). In this work we present a new Heisenberg-limited QPE algorithm for early fault-tolerant quantum computers based on compressed sensing. More specifically, given many copies of a proper initial state and queries to a specific unitary matrix, our algorithm is able to recover the phase with a total runtime $\mathcal{O}(\epsilon^{-1}\text{poly}\log(\epsilon^{-1}))$ , where $\epsilon$ is the desired accuracy. Moreover, the maximal runtime satisfies $T_{\max}\epsilon\ll\pi$ , which is comparable to the state-of-the-art algorithms, and our algorithm is also robust against certain amount of noise from sampling and state preparation.

1 Introduction

Quantum phase estimation (QPE) [1] is one of the most useful subroutines in quantum computing and plays an important role in many promising quantum applications [2, 3, 4, 5]. Given a unitary matrix $U$ and one of its eigenvectors $|\Phi\rangle$ such that $U|\Phi\rangle=e^{i2\pi\theta}|\Phi\rangle$ , the task of QPE is to estimate phase $\theta$ with high probability within a given accuracy guarantee. The problem of estimating multiple phases of $U$ has been referred to the quantum eigenvalue estimation problem (QEEP) [5, 6, 7, 8, 9, 10]. When we set the unitary matrix $U$ as the evolution operator under a Hamiltonian $H$ , the task of QPE is equivalent to estimating a specific energy level $E_{0}$ with accuracy $\epsilon$ [7, 8].

While fully fault-tolerant quantum computers may still be years away from realization, early fault-tolerant quantum computers with limited number of logical qubits and limited circuit depth are expected to be realized much sooner and to solve nontrivial tasks that demonstrate practical quantum advantages. Given the crucial role of QPE in many of such tasks, it becomes imperative to design QPE algorithms specifically tailored for early fault-tolerant quantum computers. The standard textbook QPE algorithm [11] does not require an exact eigenstate as the initial state and takes only one measurement, but it uses a large amount of ancilla qubits and controlled operations, which is fairly demanding in experiment. Although Kitaev’s original iterative QPE algorithm [1] only uses one ancilla qubit and one controlled operation (see Fig. 1), it requires the initial state to be an exact eigenstate which can be a difficult task by itself. Therefore, both of them are not suitable for early fault-tolerant quantum computers.

Much of the recent work [7, 12, 13] in QPE for early fault-tolerant quantum computers has focused on designing better protocols to improve various aspects of Kitaev’s original QPE algorithm. More specifically, the following properties are desired in designing such algorithms:

•

The quantum circuit should be simple, using at most one ancilla qubit and one controlled operation.
•

The initial state is not necessarily an exact eigenstate of $U$ .
•

The total runtime achieves the Heisenberg limit, i.e., the total cost should be
$\mathcal{O}(\epsilon^{-1}\operatorname{poly}\log(\epsilon^{-1}\delta^{-1}))$ for estimating the phase $\theta$ to accuracy $\epsilon$ with probability $1-\delta$ .
•

When the overlap of the initial state and the targeted eigenstate is large, the maximal runtime $T_{\max}$ (hence the maximum circuit depth) can be much smaller than $\pi/\epsilon$ .

Refer to caption — Figure 1: The one-ancilla quantum circuit used in Kitaev-type QPE algorithms. The measurement is done in $Z$ basis. In terms of the measurement outcome, we regard the $|0\rangle$ state as obtaining value $+1$ , and the $|1\rangle$ state as obtaining value $-1$ . $\mathbf{H}$ is the Hadamard gate. $\mathbf{W}$ has two choices: when $\mathbf{W}=I$ , the measurement outcome is $\pm 1$ with probability $(1\pm\text{Re}(\langle\Phi|U(t)|\Phi\rangle))/2$ respectively. When $\mathbf{W}=S^{\dagger}$ , the complex conjugation of the phase gate, the measurement outcome is $\pm 1$ with probability $(1\pm\text{Im}(\langle\Phi|U(t)|\Phi\rangle))/2$ instead. After taking the average over many test outcomes, we obtain an estimate of the true signal $\langle\Phi|U(t)|\Phi\rangle$ .

The procedure of these algorithms can be separated into the quantum part and the classical part. Usually, the quantum part is a combination of Hamiltonian simulation and the Hadamard test (see Fig. 1). Hamiltonian simulation algorithms [14] are used to prepare the evolution operator $U(t)$ . Longer evolution time requires more quantum gates, and the best known circuit complexity for running $U(t)$ without ancilla qubits is almost linear in $\|H\|\cdot|t|$ [15, 16]. The Hadamard test produces information about $U(t)$ in the form of a complex signal. Specifically, given an initial state $|\Phi\rangle$ , $\langle\Phi|U(t)|\Phi\rangle$ can be estimated with many runs of Hadamard tests. For QPE, the signal $\langle\Phi|U(t)|\Phi\rangle$ is dominated by a single complex sinusoidal function (i.e., $f(t)=Ae^{ikt}$ ), and the objective is to estimate its frequency $k$ . For QEEP, the signal is regarded as a linear combination of multiple complex sinusoidal functions (i.e., $f(t)=\sum_{n}A_{n}e^{ik_{n}t}$ )). The target signal $f(t)$ is said to have length- $N$ and sparsity- $K$ if $t$ takes value from integers in $[1,N]$ and contains $K$ distinct frequencies $\{k_{n}\}$ . For QEEP, $K$ is usually assumed to be much smaller than $N$ (for QPE, $K=1$ ). The classical aspect of QPE and QEEP involves estimating frequencies from these statistically sampled sparse signals, a process akin to the objectives of sparse Fourier transformation algorithms [17].

There are several aspects of evaluating the performance of a sparse Fourier transformation algorithm. The runtime complexity, the sample complexity, and the resolution are all important ingredients to consider. Here the runtime complexity quantifies how long the algorithm takes, and the sample complexity measures the number of time-domain signal samples required in the algorithm. For example, the Fast Fourier Transformation algorithm [18] has runtime complexity $\mathcal{O}(N\log N)$ with sample complexity $\mathcal{O}(N)$ . So far, the best runtime complexity is $\mathcal{O}(K\log^{c}(N)\log(N/K))$ with $c>2$ [19], and the most sample-efficient algorithm requires only $\mathcal{O}(K\log K\log N)$ samples [20]. In practical scenarios, we most likely have noisy data, necessitating the need for algorithmic robustness. Classical algorithms such as Multiple Signal Classification (MUSIC) [21] and Estimation of Signal Parameters via Rational Invariance Techniques (ESPRIT) [22] are two examples of the robust signal processing algorithms. Given the unique characteristics of our quantum setting, we prioritize the sample complexity, resolution, and robustness of an algorithm.

Many of these classical signal processing algorithms have been used in the task of QEEP. To the best of our knowledge, [6] was the first attempt to solve QEEP with Hadamard tests, where QEEP was treated as a time-series analysis problem. Later, [7] emphasized the importance of the Heisenberg-limited scaling, and listed a few other requirements for the post-processing algorithm. By applying the Fourier-filter function techniques they succeeded in designing the first Heisenberg-limited QPE algorithm for early fault-tolerant quantum computers. Their algorithm was further improved by other work [5, 8], where the maximal evolution time was largely reduced. Two recent QPE algorithms [12, 13], inspired by Robust Phase Estimation [23, 24, 25], were proposed. Both algorithms have a hierarchy structure, where the unit evolution times are taken from $\{\tau_{0},2\tau_{0},\cdots,2^{J-1}\tau_{0}\}$ . These recent algorithms improved the relation between $T_{\max}$ , the initial overlap $p_{0}$ , and the final accuracy $\epsilon$ . When the overlap $p_{0}$ is large, the work of [12] reduces the prefactor $\beta$ in the maximum runtime scaling $T_{\max}=\beta/\epsilon$ by using a subroutine called the quantum complex exponential least squares (QCELS). In contrast to [7] in which the prefactor $\beta$ is at least $\pi$ , the prefactor in [12] can be arbitrarily close to $0$ as $p_{0}\to 1$ . The more recent work [13] further improved the requirement on $p_{0}$ . In [26] and [27], the last two QPE algorithms have been extended to the QEEP set up.

In this work we present a new algorithmic framework of solving both QPE and QEEP using compressed sensing [28, 29, 30], which has not been carefully considered as far as we know. Compressed sensing is a prominent algorithm for signal recovery with wide applications in various domains such as time-frequency analysis, image processing, and quantum state tomography [31, 32, 33]. The framework of compressed sensing assumes the sparsity of the signal and recovers the entire signal from a few samples by solving a linear/semidefinite programming (SDP) problem. Here, the sparsity refers to the number of sinusoidal functions in the signal being small. The small number of required samples and the robustness makes compressed sensing an appealing choice for designing QPE algorithm.

Our main contribution is a simple and robust classical post-processing algorithm for QPE based on compressed sensing. In QPE, we regard both the uncertainty from the Hadamard tests and the inaccuracy of initial state preparation as noise. To extract target frequency $f_{0}$ (which has a linear relationship with the target energy $E_{0}$ ) in the presence of noise, our main idea is to use the robust recovery property of convex relaxation algorithm [29], a modified version of the vanilla compressed sensing. For signal vectors with size $N$ , when the frequency $f_{0}$ is nearly on-grid ( $f_{0}\approx k_{0}/N,k_{0}\in\mathbb{Z}$ ) and the noise for each sample is bounded by a constant, the convex relaxation algorithm can recover $f_{0}$ with only $\mathcal{O}(\log N)$ samples, which satisfies the Heisenberg limit. With no prior knowledge about $f_{0}$ (i.e., $f_{0}$ could be off-grid), our algorithm can still find a grid shift parameter $\nu$ such that after shifting the signal by $f_{0}\to f_{0}-\nu$ , the new signal becomes nearly on-grid, and the convex relaxation algorithm can still be applied. By searching the optimal grid shift parameter in a finite set, eventually the accuracy of $f_{0}$ is $\beta N^{-1}$ , where $\beta$ is a constant related to the noise and the interval between the grid shift parameters. In terms of the maximum runtime $T_{\max}$ , since the samples of the compressed sensing algorithm are integers in $[1,N]$ , $T_{\max}$ scales linearly in $N$ , and $T_{\text{total}}$ is $\mathcal{O}(N\log N)$ .

The rest of the paper is organized as follows. We start with preliminaries about QEEP and compressed sensing in Sec. 2. We then introduce our QPE algorithm based on compressed sensing in Sec. 3 and prove several analytical results including its Heisenberg-limit scaling. In Sec. 4 we numerically test the performance of our algorithm and compare it to previous works. Finally, we summarize several open problems and potential future research directions in Sec. 5.

2 Preliminaries

The notations frequently used in the main text are summarized in Table 1.

Table 1: Notations used in the main text

Notation	Meaning
$y(t)$	sampled time-domain signal
$y^{0}(t)$	ideal time-domain signal
$z(t)$	noise
$x(k)$	ideal frequency-domain signal
$\mathcal{T}$	set of evolution times
$r$	sampling ratio
$\Omega$	samples in compressed sensing
$\tau$	unit time step
$\epsilon$	accuracy on energy level
$\delta$	failure probability
$\eta$	noise tolerance for each signal

2.1 QEEP as a signal recovery problem

In this section, we express the QEEP as a sparse signal recovery problem. Given a specific Hamiltonian with spectrum decomposition $H=\sum_{\ell}E_{\ell}P_{\ell}$ , where $\{E_{\ell}\}$ are energy levels and $\{P_{\ell}\}$ are projectors onto the corresponding eigenstates, and an initial state $|\Phi\rangle$ , the time-domain signal in QEEP is given as

y^{0}(t)=\langle\Phi|e^{-iHt}|\Phi\rangle=\sum_{\ell}\langle\Phi|P_{\ell}|\Phi% \rangle e^{-iE_{\ell}t}.

(1)

Denote the set of energy levels that have non-zero overlaps with $|\Phi\rangle$ as the target set

\Xi=\{E_{\ell}:\langle\Phi|P_{\ell}|\Phi\rangle>0\}.

(2)

Each element in $\Xi$ corresponds to a sinusoidal function whose frequency is determined by $E_{\ell}$ . We assume $|\Xi|$ is small so that $y^{0}(t)$ can be regarded as a sparse signal. More specifically, in compressed sensing algorithms, sparsity usually means that $|\Xi|=\mathcal{O}(\log(N))$ [28], where $N$ is the size of the discrete signals.

The task of a QEEP algorithm is equivalent to estimating $\Xi$ within certain accuracy level using the data obtained from the Hadamrd tests. Assuming the output of the QEEP algorithm is $\Xi^{\ast}$ , we require $|\Xi|=|\Xi^{\ast}|$ , and

\forall E\in\Xi,\quad\min_{E^{\ast}\in\Xi^{\ast}}|E-E^{\ast}|\leq\epsilon.

(3)

We have the freedom to choose the set of evolution times to use in the algorithm. Denote the set of evolution times as $\mathcal{T}$ . For each time $t\in\mathcal{T}$ , $y^{0}(t)$ can be obtained from averaging over the Hadamard tests. More precisely, when $\mathbf{W}=I$ , the measurement outcome in Fig. 1 is a random variable

h_{x}(t)=\begin{cases}+1,\quad p=\frac{1}{2}(1+\text{Re}(y^{0}(t))),\\ -1,\quad p=\frac{1}{2}(1-\text{Re}(y^{0}(t))).\\ \end{cases}

(4)

Similarly, when $\mathbf{W}=S^{\dagger}$ , we obtain

h_{y}(t)=\begin{cases}+1,\quad p=\frac{1}{2}(1+\text{Im}(y^{0}(t))),\\ -1,\quad p=\frac{1}{2}(1-\text{Im}(y^{0}(t))).\\ \end{cases}

(5)

The summation of the two gives the estimate of $y^{0}(t)$ :

E[h(t)]=E[h_{x}(t)+ih_{y}(t)]=y^{0}(t).

(6)

After sampling the random variable $h(t)$ for $M$ times, we obtain a noisy signal:

y(t)=\overline{h(t)}=y^{0}(t)+z(t),

(7)

then we use the noisy samples $\{(t,y(t)),t\in\mathcal{T}\}$ to recover $\Xi$ . Here $z(t)$ originates from the statistical uncertainty of the Hadamard tests. Hoeffding’s inequality ensures that with probability $1-\delta^{\prime}$ , we have

|z(t)|{=}\mathcal{O}\left(\sqrt{\frac{1}{M}\log\frac{1}{\delta^{\prime}}}% \right).

(8)

In the rest of the paper, the meanings of $z(t)$ are not identical, but they always represent the part of the signal that should be considered as noise. Introduce the noise tolerance parameter $\eta$ such that the signal recovery algorithm can recover accurate estimates for $\{E_{\ell}\}$ as long as the noise of each signal is not larger than $\eta$ . Thus to guarantee $|z(t)|<\eta$ for all $t\in\mathcal{T}$ with probability $1-\delta$ , we require $\delta\leq\mathcal{O}(\delta^{\prime}|\mathcal{T}|^{-1})$ , so $M$ would be proportional to $\mathcal{O}(\log(|\mathcal{T}|/\delta)/\eta^{2})$ . For a rigorous proof of this, see Appendix A of [12].

The total experiment cost can be captured by the total runtime $T_{\text{total}}$ , which reflects the total circuit depth for completing the entire algorithm. In Hamiltonian simulation, the circuit complexity of constructing operator $e^{-iHt}$ scales linearly with $\|H\|\cdot|t|$ [15] ( $t$ can be negative). Thus, in this set up, the total runtime is

T_{\text{total}}=\sum_{t\in\mathcal{T}}M\times|t|=\mathcal{O}\left(\log(|% \mathcal{T}|/\delta)\cdot\eta^{-2}\cdot\sum_{t\in|\mathcal{T}|}|t|\right).

(9)

For instance, if the signal recovery algorithm has parameters $\eta=\mathcal{O}(1),\max_{t\in\mathcal{T}}|t|=\mathcal{O}(\epsilon^{-1})$ , and $|\mathcal{T}|=\mathcal{O}(\text{poly}\log(\epsilon^{-1}))$ , then it achieves the Heisenberg limit. We will see that the algorithm we have using compressed sensing fits this description. The maximal runtime $T_{\max}=\max_{t\in\mathcal{T}}|t|$ , which reflects the maximum circuit depth, is particularly important for early fault-tolerant quantum computers.

When $\Xi$ contains only one energy level, the task becomes QPE. In QPE, without loss of generality¹¹1In this work we don’t consider the hardness of the preparation of the initial state. From the point view of phase estimation, there’s nothing special about the ground state energy compared to other eigenvalues as long as one can prepare an initial state that is close enough to the target eigenstate., we will be mainly discussing the estimation of the ground energy $E_{0}$ , i.e., the smallest eigenvalue of $H$ . In general, we do not expect to be able to prepare the exact ground state but assume that the initial state has a large overlap with the ground state:

|\Phi\rangle=\sqrt{1-\gamma}|\Psi\rangle+\sqrt{\gamma}|\Psi^{\perp}\rangle.

(10)

Note that $\gamma$ equals to $\sqrt{1-p_{0}}$ in the Introduction. As long as $\gamma$ is small enough, the signal is still dominated by $e^{-iE_{0}t}$ , and the ground energy $E_{0}$ can be estimated efficiently.

2.2 Compressed sensing for signal recovery

In this section we introduce some necessary concepts and notations for compressed sensing. Given a vector $v=[v_{1},v_{2},\cdots,v_{N}]^{\top}$ , its $1$ -norm, $2$ -norm and $\infty$ norm are defined as

\|v\|_{1}=\sum_{n=1}^{N}|v_{n}|,\quad\|v\|_{2}=\left(\sum_{n=1}^{N}|v_{n}|^{2}% \right)^{1/2},\quad\|v\|_{\infty}=\max_{n}|v_{n}|.

(11)

In the following paragraph, we use $k$ to label the indices of entries in frequency domain, and use $n$ to label the indices of entries in time domain. Denote the set of integers from 1 to $N$ as $[N]$ . In regular compressed sensing, we deal with a time-domain discrete signal $y^{0}$ in the form of

y^{0}_{n}=\sum_{f\in\mathcal{F}}c_{f}e^{-i2\pi fn},\quad n\in[N]

(12)

where $c_{f}\in[0,1],\sum_{f}c_{f}=1,f\in[0,1)$ , and $\mathcal{F}$ is the set of frequency support. The time-domain signal can thus be written as an $N$ -dimensional vector

y^{0}=[y^{0}_{1},y^{0}_{2},\cdots,y^{0}_{N}]^{\top}.

(13)

Define the Fourier matrix as $F_{kn}=e^{-i2\pi kn/N},\ k,n\in[N]$ . By labeling its columns with $\{w_{n},n\in[N]\}$ , we have

F=[w_{1},\ w_{2},\ \cdots,\ w_{N}].

(14)

Throughout the paper, if the frequency $f$ satisfies

\exists k\in[N],\quad f=k/N,

(15)

then we say $f$ is on-grid, otherwise it is off-grid. For a frequency $f\in[0,1)$ , we define its off-grid deviation as

\nu=f-k_{f}/N,\text{ where }k_{f}=\arg\min_{k}|f-k/N|.

(16)

If all $f\in\mathcal{F}$ are on-grid satisfying Eq. (15), the frequency-domain signal $x$ can be written in the form of a vector as

x=\frac{1}{N}F^{\dagger}y^{0},\quad x_{k}=\sum_{f\in\mathcal{F}}c_{f}{\delta% \left(k-Nf\right)}.

(17)

It is a sparse vector with $|\mathcal{F}|$ non-zero entries. The purpose of compressed sensing is then to recover $x$ from a few noisy samples of $\{y^{0}_{t}\}$ .

The algorithm is accomplished in the following scenario. Choose a sampling ratio $r$ , and assign each integer $n$ in $[N]$ a random variable $1_{n}$ that satisfies

P(1_{n}=1)=r,\quad P(1_{n}=0)=1-r.

(18)

Draw one sample from each $1_{n},n\in[N]$ , and denote the set of integers with $1_{n}=1$ as the sample set $\Omega$ . The size of the sample set concentrates around $Nr$ . Based on the choice of $\Omega$ , we define the partial inverse Fourier transformation and the signal samples as

	$\displaystyle F_{\Omega}=[w_{n_{1}},\ w_{n_{2}},\ \cdots\ w_{n_{\|\Omega\|}}],$		(19)
	$\displaystyle y^{0}_{\Omega}=[y^{0}_{n_{1}},\ y^{0}_{n_{2}},\ \cdots\ y^{0}_{n% _{\|\Omega\|}}]^{\top}.$		(20)

Note that $F$ has dimension $N\times N$ , while $F_{\Omega}$ has dimension $N\times|\Omega|$ . With these notations, the compressed sensing algorithm can be thought as solving the optimization problem

\min\|s\|_{1},\quad s.t.\quad F_{\Omega}s=y^{0}_{\Omega},\quad{s\in\mathbb{R}^% {n},}

(21)

which can be rewritten as a linear programming problem. When $Nr=\mathcal{O}(\log N)$ and the frequency support $\mathcal{F}$ is sparse in the sense that $|\mathcal{F}|\leq\mathcal{O}(\log N)$ , the optimal solution $s^{\#}$ equals to the frequency-domain signal $x$ with high probability. Rigorous statements can be found in [28].

Provided that the signal has extra noise $y_{n}=y^{0}_{n}+z_{n}$ where $|z_{n}|\leq\eta$ , the signal can be approximately recovered by the convex relaxation algorithm [29]:

\min\|s\|_{1},\quad s.t.\quad\|F_{\Omega}s-y_{\Omega}\|_{2}\leq\sqrt{|\Omega|}\eta.

(22)

The difference between $s^{\#}$ and $x$ depends on the size of $\eta$ .

The uniqueness and robustness of compressed sensing solution can be analyzed through the so-called dual certificate [30]. In optimization, the dual certificate is the optimal solution to the dual problem. In a regular compressed sensing task where all frequencies are on-grid, for each frequency support $\mathcal{F}$ and each sample set $\Omega$ , the dual certificate is an $N$ -dimensional random vector $p$ such that

	$\displaystyle\exists V,\quad p=F^{\dagger}_{\Omega}V,$		(23)
	$\displaystyle\forall k,\ k/N\in\mathcal{F},\quad p_{k}=1,$		(24)
	$\displaystyle\forall k,\ k/N\not\in\mathcal{F},\quad\|p_{k}\|<1-\varepsilon(% \Omega),$		(25)

where $\varepsilon(\Omega)\in(0,1)$ is determined by the size of $\Omega$ . The existence of such a dual certificate is called the exact reconstruction principle [28]. Using dual certificate as a pivot, we can quantify the part of $s^{\#}$ that is not supported on $\mathcal{F}$ , which further helps us determine the accuracy of the algorithm. The basic idea is to separate the entries of $s^{\#}$ into two parts:

\sum_{k/N\in\mathcal{F}}|s^{\#}_{k}|,\quad\sum_{k/N\not\in\mathcal{F}}|s^{\#}_% {k}|.

(26)

Because the true frequency-domain solution $x_{k}$ satisfies the constraint in Eq. (22), the 1-norm of $s^{\#}$ is upper bounded by the 1-norm of $x$ . On the other hand, the norm of the inner product between $s^{\#}$ and $p$ can be well-estimated, and the restrictions on the entries of $p$ enable us to deal with $\sum_{k/N\in\mathcal{F}}|s^{\#}_{k}|$ and $\sum_{k/N\not\in\mathcal{F}}|s^{\#}_{k}|$ separately. The combination of these considerations eventually provides us an upper bound on $\sum_{k/N\not\in\mathcal{F}}|s^{\#}_{k}|$ .

To write the QEEP signal in Eq. (1) in the form of the compressed sensing signal in Eq. (12), we introduce a unit time step $\tau$ , such that

y^{0}_{n}=y^{0}(n\tau),\quad\mathcal{T}=\{n\tau,n\in\Omega\}.

(27)

In the same framework, we have the following correspondence:

f=\frac{E_{\ell}\tau}{2\pi},\quad\mathcal{F}=\left\{\frac{E_{\ell}\tau}{2\pi},% \quad E_{\ell}\in\Xi\right\}.

(28)

The target energy levels $\Xi$ can be directly obtained from the frequency support $\mathcal{F}$ . Note that the frequencies are defined on $[0,1)$ . To keep the order of energy levels unchanged, it is necessary to have

\forall\ell,\quad E_{\ell}\tau\in[0,2\pi).

(29)

This condition can always be satisfied by adding a constant to the Hamiltonian $H$ and choosing $\tau$ properly.

3 Single eigenvalue estimation

In this section, we provide a quantitative restriction on the noise tolerance $\eta$ in Eq. (22). Then by considering the combination of three sources of inexactness, namely the uncertainty of the Hadamard test, the inaccuracy of the initial state, and the off-grid deviation as noise, we prove performance bounds for our algorithm.

Let us start with the case when the signal contains only one frequency $\mathcal{F}=\{f_{0}\}$ and the frequency is on-grid: $f_{0}=\chi/N,\chi\in[N]$ , for which we are given the values of

y^{0}_{n}=e^{-i2\pi\chi n/N}

(30)

on a random sample set $\Omega$ . Note that the frequency-domain signal is $x_{k}=\delta_{k,\chi}$ . The following lemma shows how its dual certificate can be constructed.

Lemma 1.

Given values of signal $y^{0}_{n}=e^{-i2\pi\chi n/N}$ on a random sample set $\Omega$ with sampling ratio

r>\frac{35}{N}\ln\left(\frac{5N}{\delta}\right),

(31)

the random vector $p=F^{\dagger}_{\Omega}y^{0}_{\Omega}/|\Omega|$ satisfies

p_{\chi}=1;\quad|p_{k}|\leq\frac{1}{2},\ k\neq\chi

(32)

with probability at least $1-\delta$ .

Proof.

See Appendix A. ∎

We can now prove the robust recovery of algorithm Eq. (22) using Lemma 1.

Lemma 2.

Given a noisy signal $y_{n}=e^{-i2\pi\chi n/N}+z_{n}$ and a random sample set $\Omega$ with sampling ratio satisfying Lemma 1, if the noise satisfies $\|z_{\Omega}\|_{2}\leq\sqrt{|\Omega|}\eta$ , the solution $s^{\#}$ to Eq. (22) satisfies:

\sum_{k\neq\chi}|s^{\#}_{k}|\leq 4\eta,\quad|s^{\#}_{\chi}|\geq 1-4\eta

(33)

with probability at least $1-\delta$ .

Proof.

See Appendix B. ∎

By Lemma 2 we see that as long as $\eta<1/8$ , we have $|s^{\#}_{\chi}|>\sum_{k\neq n}|s^{\#}_{\chi}|$ , so the solution to Eq. (22) still satisfies $\chi=\arg\max|s^{\#}_{k}|$ .

In practical situations we cannot assume that the frequency is always perfectly on-grid, but this is not a problem. First, when $Nf_{0}$ is very close to an integer, Eq. (22) can still approximately recovery the signal. The following proposition quantifies how large the off-grid deviation can be.

Proposition 1.

Given a signal $y_{n}=e^{-i2\pi(\chi+\omega)n/N}$ with $\chi\in\mathbb{Z}_{N},\,\omega\in(-0.5,0.5]$ , and a set of samples $\Omega$ with sampling ratio satisfying Lemma 1, if $|\omega|<1/16\pi$ , then with probability at least $1-\delta$ , the optimal solution $s^{\#}$ to Eq. (21) satisfies

\chi=\arg\max|s^{\#}_{k}|.

(34)

Proof.

In this situation, we have

	$\displaystyle{z_{n}=e^{-i2\pi\chi n/N}(e^{i2\pi\omega n/N}-1)},$		(35)
	$\displaystyle\|z_{n}\|=2\|\sin(\pi\omega n/N)\|<2\pi\|\omega\|n/N.$		(36)

According to Lemma 2, one sufficient condition for robust recovery is $|z_{n}|\leq 1/8$ for all $n$ . Hence it is enough to have $|\omega|\leq 1/16\pi$ . ∎

The $|\omega|$ in Proposition 34 represents the off-grid deviation. If $|\omega|\leq 1/16\pi$ , the solution $f^{\#}=\arg\max|s^{\#}_{k}|/N$ is still a good approximation of the true frequency $(\chi+\omega)/N$ , and the accuracy still scales as $\mathcal{O}(N^{-1})$ . When the off-grid deviation is large, we can find a grid-shift parameter $\nu$ such that after the transformation $y_{n}\to y_{n}e^{i2\pi\nu n/N}$ , the new signal is nearly on-grid. The details are given in Algorithm 1.

Besides the Hadamard tests and the off-grid deviation, the third type of noise comes from the preparation of the initial state. In practice, it is hard to prepare the exact ground state $|\Psi_{0}\rangle$ . Suppose the actual initial state is

|\Phi\rangle=\sqrt{1-\gamma}|\Psi_{0}\rangle+\sum_{\ell\neq 0}\sqrt{\gamma_{% \ell}}|\Psi_{\ell}\rangle,

(37)

with $\sum_{\ell\neq 0}\gamma_{\ell}=\gamma$ , then $y^{0}_{n}$ can be expanded as

\langle\Phi|U(n\tau)|\Phi\rangle=(1-\gamma)e^{-iE_{0}n\tau}+\sum_{\ell\neq 0}% \gamma_{\ell}e^{-iE_{\ell}n\tau}=e^{-iE_{0}n\tau}+\mathcal{O}(\gamma).

(38)

Note that only $e^{-iE_{0}n\tau}$ is the target signal and we can treat the $\mathcal{O}(\gamma)$ term as noise as well. Thus, let $E_{0}\tau=2\pi(\chi+\omega)/N$ , then each sampled signal can be decomposed as

$\displaystyle y_{n}$	$\displaystyle=\langle\Phi\|U(n\tau)\|\Phi\rangle+h_{n}$	(39)
	$\displaystyle=\langle\Psi\|U(n\tau)\|\Psi\rangle+h_{n}+\mathcal{O}(\gamma)$
	$\displaystyle=e^{-iE_{0}n\tau}+h_{n}+\mathcal{O}(\gamma)$
	$\displaystyle=e^{-i2\pi(\chi+\nu)n/N}+h_{n}+\mathcal{O}(\gamma)+e^{-i2\pi(\chi% +\nu)n/N}(e^{-i2\pi(\omega-\nu)n/N}-1)$

where $h_{n}$ originates from the Hadamard test, $\mathcal{O}(\gamma)$ originates from the state preparation, and $\nu$ is a grid-shift parameter. Recall that in Lemma 2, as long as $\eta<1/8$ , the noise does not break down the signal recovery (after shifting the entire signal by $y_{n}\to y_{n}e^{i2\pi\nu n/N}$ ). We separate $\eta$ into three parts. Let $2\gamma\leq\eta_{1}$ , $|h_{n}|\leq\eta_{2}$ , $2\pi|\omega-\nu|<\eta_{3}$ , and $\eta_{1}+\eta_{2}+\eta_{3}\leq 1/8$ . Then given $\eta_{1}$ , $\eta_{2}$ , $\eta_{3}$ , the fidelity error upper bound $\sqrt{1-\eta_{1}^{2}/4}$ , the number of Hadamard tests that is at least $\mathcal{O}(\log|\Omega|/\eta_{2}^{2})$ , and the number of grid shifts as $\lceil 2\pi/\eta_{3}\rceil$ , the complete algorithm for QPE is presented in Algorithm 1.

Algorithm 1 QPE

1:accuracy level

N

, unit time step

\tau

, failure probability

\delta_{1},\delta_{2}

, noise tolerance parameters

\eta_{1},\eta_{2},\eta_{3}

, Hamiltonian

H

, and an initial state

|\Phi\rangle

with overlap larger than

\sqrt{1-\eta_{1}^{2}/4}

E^{\ast}=2\pi k^{\ast}/N

3:Sample integers from

[N]

each with sampling ratio

r=\Theta(\ln(N/\delta_{1}))

, denoted by

\Omega

4:for

n\in\Omega

5: Prepare the initial state

|\Phi\rangle

and unitary operator

e^{-iHn\tau}

;

6: Perform Hadamard tests on

e^{-iHn\tau}|\Phi\rangle

for

\mathcal{O}(\log(|\Omega|/\delta_{2})/\eta_{2}^{2})

times;

7: Calculate the average value of the test outcomes as signal

y_{n}

8:end for

9:for

j=1,2,\cdots,\lceil 2\pi/\eta_{3}\rceil

10: Set

\nu=-0.5+j/\lceil 2\pi/\eta_{3}\rceil

;

11: Generate a new signal set

\{\tilde{y}_{n}=y_{n}\cdot e^{i2\pi\nu n/N},n\in\Omega

};

12: Set

\eta=\eta_{1}+\eta_{2}+\eta_{3}

. Solve

\min\|s\|_{1},\quad s.t.\quad\|F_{\Omega}s-\tilde{y}_{\Omega}\|_{2}\leq\sqrt{|% \Omega|}\eta

to obtain solution

s^{\#}

;

13: Set

k_{\nu}=\arg\max|s^{\#}_{k}|+\nu

;

14: Calculate the total empirical error

\sum_{n\in\Omega}|y_{n}-e^{-i2\pi k_{\nu}n/N}|^{2}

15:end for

16:Find the frequency

k_{\nu}

with the smallest total empirical error, set it as

k^{\ast}

Theorem 1.

Given the ground state $|\Psi\rangle$ , the ground-state energy $E_{0}$ , and the initial state $|\Phi\rangle=\sqrt{1-\gamma}|\Psi\rangle+\sqrt{\gamma}|\Psi^{\perp}\rangle$ , if the noise tolerance parameters satisfy

\eta_{1}+\eta_{2}+\eta_{3}\leq\frac{1}{8},\quad{\eta_{1}\geq 2\gamma},

(40)

and the accuracy level satisfies $N\geq 100$ , then with probability at least $(1-\delta_{1})(1-\delta_{2})(1-\delta_{3})^{3}$ , where $\delta_{3}=(\delta_{1}/4N)^{1/4}$ , the output of Algorithm 1 satisfies

|E_{0}-E^{\ast}|_{\text{mod }2\pi/\tau}\leq\frac{\pi}{N\tau}\left[\frac{1.01% \sqrt{3}}{2}\eta_{3}+\frac{\sqrt{6}}{2}(\eta_{1}+\eta_{2})\right]=\epsilon,

(41)

and the cost of the algorithm satisfies

	$\displaystyle T_{\max}\leq N\tau,$		(42)
	$\displaystyle T_{\text{total}}=\mathcal{O}\left(N\tau\cdot\log(N/\delta_{1}% \eta_{3})\cdot\log(\log(N/\delta_{1}\eta_{3})/\delta_{2})/\eta_{2}^{2}\right).$		(43)

Proof.

Since $\delta_{2}$ denotes the total failure rate of the Hadamard tests, by Hoeffding’s inequality $\log(|\Omega|\delta_{2}^{-1})/\eta_{2}^{2}$ Hadamard tests are enough to guarantee that for all $n\in\Omega$ , the additive error from the Hadamard tests is smaller than $\eta_{2}$ with probability $1-\delta_{2}$ .

Now we analyze the accuracy on $E_{0}$ . Let $k_{0}=NE_{0}\tau/2\pi$ . Define $a(k)$ as following

a(k)=[e^{-i2\pi k/N},e^{-i2\pi k2/N},\cdots,e^{-i2\pi kN/N}]^{\top}.

(44)

Let $a(k)_{\Omega}$ be the projection of $a(k)$ on the random set $\Omega$ . The sampled signal becomes

y_{\Omega}=a(k_{0})_{\Omega}+z_{\Omega},\quad|z_{n}|\leq\bar{\eta}=\eta_{1}+% \eta_{2}.

(45)

Define the total empirical error function as

\mathcal{E}(k)=\|a(k)_{\Omega}-a(k_{0})_{\Omega}-z_{\Omega}\|_{2}.

(46)

The choice of the grid shift parameters and the sampling ratio guarantee that, among the set of solutions $\mathcal{K}=\{k_{\nu}\}$ , there exists at least one $k_{1}\in\mathcal{K}$ satisfying

|k_{1}-k_{0}|_{\text{mod }N}\leq\frac{\eta_{3}}{2\pi},\quad\mathcal{E}(k_{1})% \leq\sqrt{|\Omega|}\eta

(47)

with probability at least $1-\delta_{1}$ . We omit the notation $\mod N$ from now on. If $k_{1}$ is the only one in $\mathcal{K}$ that minimizes the function, then $k^{\ast}=k_{1}$ , and the accuracy on $k^{\ast}$ is $\eta_{3}/2\pi$ . Otherwise, suppose there exists $\mathcal{E}(k_{2})\leq\mathcal{E}(k_{1})$ (no matter $k_{2}$ originates from a successful compressed sensing algorithm or not), we need to quantify the relation between $|k_{1}-k_{0}|$ and $|k_{2}-k_{0}|$ and show that $|k_{2}-k_{0}|=\mathcal{O}(\eta)$ with high probability.

Our strategy is the following: if $|k_{2}-k_{0}|\leq|k_{1}-k_{0}|$ , then this already gives an error upper bound $\eta_{3}/2\pi$ ; thus we only consider the case where $|k_{2}-k_{0}|>|k_{1}-k_{0}|$ . Suppose we can find positive $\beta_{0},\beta_{1},\kappa$ such that for all $|k-k_{0}|\leq\kappa$ ,

\|a(k)_{\Omega}-a(k_{0})_{\Omega}\|_{2}\in\left[\sqrt{|\Omega|}\beta_{0}|k-k_{% 0}|,\sqrt{|\Omega|}\beta_{1}|k-k_{0}|\right],

(48)

then if $|k_{2}-k_{0}|<\kappa$ , we have $|k_{1}-k_{0}|<|k_{2}-k_{0}|<\kappa$ as well, and

	$\displaystyle\mathcal{E}(k_{2})\leq\mathcal{E}(k_{1}),$		(49)
	$\displaystyle\\|a(k_{2})_{\Omega}-a(k_{0})_{\Omega}-z_{\Omega}\\|_{2}\leq\\|a(k_{% 1})_{\Omega}-a(k_{0})_{\Omega}-z_{\Omega}\\|_{2},$		(50)
	$\displaystyle\\|a(k_{2})_{\Omega}-a(k_{0})_{\Omega}\\|_{2}-\sqrt{\|\Omega\|}\bar{% \eta}\leq\\|a(k_{1})_{\Omega}-a(k_{0})_{\Omega}\\|_{2}+\sqrt{\|\Omega\|}\bar{\eta},$		(51)
	$\displaystyle\sqrt{\|\Omega\|}\beta_{0}\|k_{2}-k_{0}\|\leq\sqrt{\|\Omega\|}\beta_{1}% \|k_{1}-k_{0}\|+2\sqrt{\|\Omega\|}\bar{\eta},$		(52)
	$\displaystyle\|k_{2}-k_{0}\|\leq\frac{\beta_{1}}{\beta_{0}}\cdot\frac{\eta_{3}}{% 2\pi}+\frac{2\bar{\eta}}{\beta_{0}}.$		(53)

On the other hand, suppose for all $k\in\mathcal{K},\,k\neq k_{1},$ in region $|k-k_{0}|>\kappa$ , we have a constant lower bound:

\|a(k)_{\Omega}-a(k_{0})_{\Omega}\|_{2}\geq\sqrt{|\Omega|}\cdot c.

(54)

Using the same argument, $\mathcal{E}(k)\leq\mathcal{E}(k_{1})\leq\sqrt{|\Omega|}\eta$ leads to

\displaystyle c\leq\eta+\bar{\eta}<1/4.

(55)

Therefore, if $c>1/4$ , we can rule out the possibility of $|k_{2}-k_{0}|>\kappa$ .

Regard $\|a(k)_{\Omega}-a(k_{0})_{\Omega}\|^{2}_{2}$ as the summation of random variables $X^{\prime}_{n}=4\sin^{2}(n\pi(k-k_{0})/N)1_{n}$ . According to Lemma 90 (in Appendix C), with probability at least

\begin{split}(1-\exp(-Nr/36))(1-2\exp(-Nr/24))&>(1-\exp(-Nr/36))^{3}\\ &>[1-(\delta_{1}/4N)^{1/4}]^{3},\end{split}

(56)

the following parameters suffice for Eq. (48) and Eq. (54):

\kappa=1/2,\quad\beta_{0}=\sqrt{8/3},\quad\beta_{1}=1.01\sqrt{2}\pi,\quad c=% \sqrt{5/6}.

(57)

Thus, because $\sqrt{5/6}>1/4$ , $k_{2}$ must belongs to $|k_{2}-k_{0}|<1/2$ . In this region, we have

|k_{2}-k_{0}|\leq\frac{1.01\sqrt{3}\pi}{2}\cdot\frac{\eta_{3}}{2\pi}+\frac{% \bar{\eta}}{\sqrt{2/3}}=\frac{1.01\sqrt{3}\eta_{3}}{4}+\frac{\sqrt{6}\bar{\eta% }}{4},

(58)

which determines the accuracy of $|E^{\ast}-E_{0}|$ .

The maximal runtime equals to $\max_{n\in\Omega}n\tau$ . Because $\Omega$ is sampled from $[N]$ , we have $T_{\max}\leq N\tau$ . The upper bound on $T_{\text{total}}$ can be obtained from Eq. (9) and related parameters in the algorithm. ∎

Remark.

In order to express $T_{\max}$ and $T_{\text{total}}$ in terms of $\epsilon$ and $\gamma$ , we can choose $\eta_{2},\eta_{3}$ to be smaller than $\eta_{1}$ and obtain the following approximation:

N=\mathcal{O}\left(\frac{\max\{\eta_{1},\eta_{2},\eta_{3}\}}{\epsilon}\right)=% \mathcal{O}(\gamma\epsilon^{-1}).

(59)

Thus, because $T_{\max}=\mathcal{O}(N)\text{ and }T_{\text{total}}=\mathcal{O}(N\log(N/\delta% _{3})/\eta_{2}^{2})$ , we have

T_{\max}=\mathcal{O}(\gamma\epsilon^{-1}),\quad T_{\text{total}}=\mathcal{O}% \left(\gamma^{-1}\epsilon^{-1}\log(\epsilon^{-1})\right).

(60)

In Fig. 2 we demonstrate our algorithm with a simple test case where the target frequency is $f_{0}=(\chi+\omega)/N$ with $N=1000,\chi=20$ , and $\omega=0.25$ . As shown in Fig. 2, when the grid shift parameter is in region $[0,0.5]$ , the integer parts of { $k_{\nu}\}$ are all 20, which matches with the $\chi$ . In the same region, the total empirical error is approximately a quadratic function of the grid shift parameter.

For this test, the three noise tolerance parameters are

\eta_{1}=0.1,\quad\eta_{2}\approx 0.1,\quad\eta_{3}=\frac{\pi}{10},

(61)

therefore $\eta\approx 0.344$ . Clearly, the noise tolerance parameters do not satisfy Eq. (40), but the algorithm still outputs the true frequency $f^{\ast}=0.02025$ . This numerical experiment demonstrates that the parameters in Lemma 2 and Theorem 1 can potentially be further improved.

It is worth mentioning that if we assume the energies of $H$ are nearly on-grid simultaneously, then our algorithm can be directly applied to the multiple phase estimation. Indeed, suppose that there are $|\Xi|$ eigenvalues with non-negligible amplitudes, then for each compressed sensing task, we can choose the $|\Xi|$ largest entries of $s^{\#}$ as a trial solution for the frequency domain signal. Similarly, by trying all the grid shift parameters we can obtain an optimal solution $s^{\ast}$ , and the output is related to the $|\Xi|$ largest entries of $s^{\ast}$ :

\hat{E}_{l}=\frac{2\pi(n_{l}+\nu^{\ast})}{N\tau},\quad l=1,2,\cdots,|\Xi|.

(62)

Unfortunately, there is no guarantee that all the frequencies are nearly on-grid. For example, if $f_{0}=\chi_{0}/N,f_{1}=(\chi_{1}+0.5)/N$ , our compressed sensing algorithm is not guaranteed to work no matter how we choose the grid shift parameters. Essentially, the frequencies we try to estimate are continuous parameters, however, we can only find solution in a discrete set. This problem is formally termed as basis mismatch in signal analysis.

4 Comparison to previous works

In this section, we compare our algorithm with previous results. First, we briefly introduce the two types of QCELS algorithms for QPE: the first one [12] (named multi-level QCELS) has a hierarchy structure similar to Robust Phase Estimation and it is used for QPE, the second one [26] (named MM-QCELS) is designed for QEEP.

The outline of the two algorithms can be described as follows. The two algorithms share a hierarchy structure, namely, the algorithm can be divided into several hierarchies. At each hierarchy, the algorithm outputs an estimate of the target energy, and in the next hierarchy, it searches for solutions in a narrower region and obtains a new estimate. At each hierarchy the multilevel QCELS first picks a time scale: $\{\tau_{0},2\tau_{0},\cdots,2^{J-1}\tau_{0}\}$ , then performs the Hadamard tests for $t=\tau,2\tau,\cdots,N_{0}\tau$ separately. Here $N_{0}$ is a constant. Given the value of the noisy signal $y(t)$ at $N_{0}$ different times, the algorithm outputs the estimate for $E$ by minimizing the following cost function:

L(r,E)=\frac{1}{N_{0}}\sum_{n=1}^{N_{0}}\left|re^{-iEn\tau}-y(n\tau)\right|^{2}.

(63)

This algorithm is proved to be efficient for single-phase estimation but not for multiple-phases estimation. The authors later proposed another relevant algorithm (MM-QCELS). In this method, they adapt the previous algorithm in two steps: the times at which the Hadamard tests are performed become random samples drawn from a probability distribution, and the cost function is changed to

L(\vec{r},\vec{E})=\frac{1}{|\mathcal{T}|}\sum_{t\in\mathcal{T}}\left|\sum_{k=% 1}^{K}r_{k}e^{-iE_{k}t}-y(t)\right|^{2}.

(64)

Moreover, when applying MM-QCELS to single-phase estimation, the hierarchy structure can be removed, so that the algorithm only has one-level.

We show ythe between multi-level QCELS and Algorithm 1 in Fig. 3. Here we perform a similar numerical test as the one in Sec. V. A of [12] using our algorithm. It turns out for the specific choice of sampling ratio, the scalings of averaged error versus $T_{\max}$ are almost the same, while our $T_{\text{total}}$ is slightly larger than theirs.

Essentially, the idea of compressed sensing is similar to that of the one-level MM-QCELS. As mentioned in the discussion section of [26], the one-level MM-QCELS may be able to estimate the dominant phase efficiently. In both methods, one intends to fit the sampled data by an ansatz of the signal. The difference lies in the target function to optimize. In one-level MM-QCELS, the target function is the total empirical error in time domain. In compressed sensing, the target function is the 1-norm of the frequency domain signal.

For single-phase estimation, our compressed-sensing-based algorithm does not have a prominent advantage comparing to multi-level QCELS. However, for specific multiple-phase estimation tasks, our Algorithm 1 can perform better than the one-level MM-QCELS algorithm (see for example Fig. 4). This is because compressed sensing works for on-grid sparse signals in general, and the energy levels of $H_{0}$ times $\tau$ can happen to be nearly on-grid at the same time.

5 Discussion

In this paper we presented a simple and robust algorithm for quantum phase estimation using compressed sensing. For the single eigenvalue estimation (i.e., QPE), we rigorously established its Heisenberg-limit scaling and numerically demonstrated its comparable performance compared to the other state-of-the-art QPE algorithms. We further did some exploratory work for the case of the multiple eigenvalues estimation (i.e., QEEP) and showed that empirically for certain scenarios our algorithm can be more sample-efficient than previous algorithms. Potentially in practice one can use our algorithm to solve the QEEP as a first attempt, and if the signal recovered by the algorithm fits the data well enough, then one already obtains a good estimation of the eigenvalues; otherwise we can switch to the other algorithms. It would be an excellent future direction to design a more complete compressed sensing based QEEP algorithm that performs well in practice while achieving the Heisenberg limit.

One main contribution of our work is the new connection between QPE and compressed sensing we established. While some aspects of our algorithm are still not ideal, e.g., our algorithm currently requires a higher initial overlap $\gamma$ compared to the other recent works [7, 12, 13], we believe that our work provides a solid first step towards a more robust and resource-efficient compressed sensing based QPE algorithm for early fault-tolerant quantum computers, especially considering the rich literature on compressed sensing and its efficiency and robustness demonstrated in practical applications. Moreover, in practice before running any QPE algorithms, one always needs to prepare the initial state based on the choice of the Hamiltonian first. As long as we only require a constant overlap with the target eigenstate, the cost in state preparation should be approximately in the same order.

Lastly, we summarize a few open questions here:

•

As we mentioned along the numerical demonstration in Fig. 2, the results suggest that our algorithm is more robust than what we were able to prove rigorously. It therefore should be possible to further improve the noise tolerance in Lemma 2, which would also result in reducing the total runtime in our analysis.
•

Instead of sampling discrete time steps, we can also sample the evolution times on a continuous region. Similar to the setup in [26], we can design a probability distribution of evolution times $q(t)$ , which then, as long as $E_{q}[|t|]=\mathcal{O}(N^{-1})$ , the Heisenberg limit is still satisfied. Theoretically, continuous Fourier sampling can further reduce the sample complexity and the maximal runtime.
•

Designing Heisenberg-limited QEEP algorithms based on compressed sensing. One solution that we tried is the off-grid compressed sensing [35]. See Appendix D for a detailed discussion. Numerically the accuracy of the off-grid algorithm seems not as good as the other state-of-the-art methods. It would be an excellent direction to further improve this method, or find an alternative solution that solves the basis mismatch problem in QEEP.
•

Finding the optimal grid shift parameter by optimization instead of trying every grid shift parameter in a trial set.

6 Acknowledgement

We thank Tianyu Wang for helpful discussions. C.Y. acknowledges support from the National Natural Science Foundation of China (Grant No. 92165109), National Key Research and Development Program of China (Grant No. 2022YFA1404204), and Shanghai Municipal Science and Technology Major Project (Grant No. 2019SHZDZX01). C.Z. and J.T. acknowledge support from the U.S. National Science Foundation under Grant No. 2116246, the U.S. Department of Energy, Office of Science, National Quantum Information Science Research Centers, and Quantum Systems Accelerator.

References

[1] A Yu Kitaev. Quantum measurements and the abelian stabilizer problem. arXiv preprint quant-ph/9511026, 1995.
[2] Peter W Shor. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM review, 41(2):303–332, 1999.
[3] Daniel S Abrams and Seth Lloyd. Quantum algorithm providing exponential speed increase for finding eigenvalues and eigenvectors. Physical Review Letters, 83(24):5162, 1999.
[4] Sam McArdle, Suguru Endo, Alán Aspuru-Guzik, Simon C Benjamin, and Xiao Yuan. Quantum computational chemistry. Reviews of Modern Physics, 92(1):015003, 2020.
[5] Ruizhe Zhang, Guoming Wang, and Peter Johnson. Computing ground state properties with early fault-tolerant quantum computers. Quantum, 6:761, 2022.
[6] Rolando D Somma. Quantum eigenvalue estimation via time series analysis. New Journal of Physics, 21(12):123025, 2019.
[7] Lin Lin and Yu Tong. Heisenberg-limited ground-state energy estimation for early fault-tolerant quantum computers. PRX Quantum, 3(1):010318, 2022.
[8] Guoming Wang, Daniel Stilck-França, Ruizhe Zhang, Shuchen Zhu, and Peter D Johnson. Quantum algorithm for ground state energy estimation using circuit depth with exponentially improved dependence on precision. arXiv preprint arXiv:2209.06811, 2022.
[9] Thomas E O’Brien, Brian Tarasinski, and Barbara M Terhal. Quantum phase estimation of multiple eigenvalues for small-scale (noisy) experiments. New Journal of Physics, 21(2):023022, 2019.
[10] Alicja Dutkiewicz, Barbara M. Terhal, and Thomas E O’Brien. Heisenberg-limited quantum phase estimation of multiple eigenvalues with few control qubits. Quantum, 6:830, oct 2022.
[11] Michael A Nielsen and Isaac Chuang. Quantum computation and quantum information, 2010.
[12] Zhiyan Ding and Lin Lin. Even shorter quantum circuit for phase estimation on early fault-tolerant quantum computers with applications to ground-state energy estimation. PRX Quantum, 4:020331, May 2023.
[13] Hongkang Ni, Haoya Li, and Lexing Ying. On low-depth algorithms for quantum phase estimation. arXiv preprint arXiv:2302.02454, 2023.
[14] Iulia M Georgescu, Sahel Ashhab, and Franco Nori. Quantum simulation. Reviews of Modern Physics, 86(1):153, 2014.
[15] Andrew M Childs and Yuan Su. Nearly optimal lattice simulation by product formulas. Physical review letters, 123(5):050503, 2019.
[16] Andrew M Childs, Yuan Su, Minh C Tran, Nathan Wiebe, and Shuchen Zhu. Theory of trotter error with commutator scaling. Physical Review X, 11(1):011020, 2021.
[17] Haitham Hassanieh, Piotr Indyk, Dina Katabi, and Eric Price. Nearly optimal sparse fourier transform. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 563–578, 2012.
[18] William T Cochran, James W Cooley, David L Favin, Howard D Helms, Reginald A Kaenel, William W Lang, George C Maling, David E Nelson, Charles M Rader, and Peter D Welch. What is the fast fourier transform? Proceedings of the IEEE, 55(10):1664–1674, 1967.
[19] Anna C Gilbert, Shan Muthukrishnan, and Martin Strauss. Improved time bounds for near-optimal sparse fourier representations. In Wavelets XI, volume 5914, pages 398–412. SPIE, 2005.
[20] Piotr Indyk, Michael Kapralov, and Eric Price. (nearly) sample-optimal sparse fourier transform. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 480–499. SIAM, 2014.
[21] Wen**g Liao and Albert Fannjiang. Music for single-snapshot spectral estimation: Stability and super-resolution. Applied and Computational Harmonic Analysis, 40(1):33–67, 2016.
[22] Richard Roy and Thomas Kailath. Esprit-estimation of signal parameters via rotational invariance techniques. IEEE Transactions on acoustics, speech, and signal processing, 37(7):984–995, 1989.
[23] BL Higgins, DW Berry, SD Bartlett, MW Mitchell, HM Wiseman, and GJ Pryde. Demonstrating heisenberg-limited unambiguous phase estimation without adaptive measurements. New Journal of Physics, 11(7):073023, 2009.
[24] Shelby Kimmel, Guang Hao Low, and Theodore J Yoder. Robust calibration of a universal single-qubit gate set via robust phase estimation. Physical Review A, 92(6):062315, 2015.
[25] Federico Belliardo and Vittorio Giovannetti. Achieving heisenberg scaling with maximally entangled states: An analytic upper bound for the attainable root-mean-square error. Physical Review A, 102(4), oct 2020.
[26] Zhiyan Ding and Lin Lin. Simultaneous estimation of multiple eigenvalues with short-depth quantum circuit on early fault-tolerant quantum computers. arXiv preprint arXiv:2303.05714, 2023.
[27] Haoya Li, Hongkang Ni, and Lexing Ying. On low-depth quantum algorithms for robust multiple-phase estimation. arXiv preprint arXiv:2303.08099, 2023.
[28] Emmanuel J Candes and Terence Tao. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE transactions on information theory, 52(12):5406–5425, 2006.
[29] Emmanuel J Candes, Justin K Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8):1207–1223, 2006.
[30] Emmanuel J Candès, Justin Romberg, and Terence Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52(2):489–509, 2006.
[31] David Gross, Yi-Kai Liu, Steven T Flammia, Stephen Becker, and Jens Eisert. Quantum state tomography via compressed sensing. Physical review letters, 105(15):150401, 2010.
[32] A. Smith, C. A. Riofrí o, B. E. Anderson, H. Sosa-Martinez, I. H. Deutsch, and P. S. Jessen. Quantum state tomography by continuous measurement and compressed sensing. Physical Review A, 87(3), mar 2013.
[33] Amir Kalev, Robert L. Kosut, and Ivan H. Deutsch. Quantum tomography protocols with positivity are compressed sensing protocols. npj Quantum Information, 1(1):15018, Dec 2015.
[34] https://github.com/CYI1995/QEEP/tree/main/Paper_QPE.
[35] Gongguo Tang, Badri Narayan Bhaskar, Parikshit Shah, and Benjamin Recht. Compressed sensing off the grid. IEEE transactions on information theory, 59(11):7465–7490, 2013.
[36] Joel A Tropp et al. An introduction to matrix concentration inequalities. Foundations and Trends® in Machine Learning, 8(1-2):1–230, 2015.
[37] Yohann De Castro and Fabrice Gamboa. Exact reconstruction using beurling minimal extrapolation. Journal of Mathematical Analysis and applications, 395(1):336–354, 2012.
[38] Clarice Poon, Nicolas Keriven, and Gabriel Peyré. The geometry of off-the-grid compressed sensing. Foundations of Computational Mathematics, 23(1):241–327, 2023.

Appendix A Proof of Lemma 1

The outline of proof largely follows Lemma 6.6 of [28].

Proof.

By the definition of $p$ , we can verify that $p_{\chi}=1$ . For all $k\neq\chi$ , we have

p_{k}=\frac{1}{|\Omega|}\sum_{n\in\Omega}e^{i2\pi\bar{k}n/N},\quad\bar{k}=k-\chi.

(65)

Our target is to estimate $P(|p_{k}|>1/2)$ . Using

	$\displaystyle P(\|p_{k}\|\cdot\|\Omega\|>Nr/4)\geq P(\|p_{k}\|\cdot\|\Omega\|>\|\Omega\|% /2\cap\|\Omega\|>Nr/2),$		(66)
	$\displaystyle P(A\cap B)=P(A)+P(B)-P(A\cup B)\geq P(A)+P(B)-1,$		(67)
	$\displaystyle P(\|p_{k}\|\cdot\|\Omega\|>Nr/4)\geq P(\|p_{k}\|\cdot\|\Omega\|>\|\Omega\|% /2)+P(\|\Omega\|>Nr/2)-1,$		(68)

we obtain

P(|p_{k}|>1/2)\leq P(|\Omega|\leq Nr/2)+P(|p_{k}|\cdot|\Omega|>Nr/4).

(69)

Using Bernstein’s inequality, the first term is bounded by

P(|\Omega|\leq Nr/2)<\exp\left(-\frac{3}{28}\frac{Nr}{1-r}\right).

(70)

To estimate the second term, we associate $|p_{k}|$ with the summation of a sequence of random matrices:

M_{n}=R_{n}(1_{n}-r),\quad R_{n}=\left(\begin{array}[]{cc}\cos(2\pi n\bar{k}/N% )&\sin(2\pi n\bar{k}/N)\\ -\sin(2\pi n\bar{k}/N)&\cos(2\pi n\bar{k}/N)\end{array}\right).

(71)

where $\{1_{n}\}$ are defined in Eq. (18). Because when $\bar{k}\neq 0$ , $\sum_{n\in[N]}R_{n}r=0$ , the summation of $M_{n}$ is

\sum_{n\in[N]}M_{n}=\sum_{n\in[N]}R_{n}1_{n}=\sum_{n\in\Omega}R_{n}

(72)

which implies

\|\sum_{n\in[N]}M_{n}\|=\|\sum_{n\in\Omega}R_{n}\|=|\sum_{n\in\Omega}e^{i2\pi n% \bar{k}/N}|=|\Omega|\cdot|p_{k}|.

(73)

The matrix Bernstein’s inequality (Theorem 1.6.2 of [36]) states that

P\left(\|\sum_{n\in[N]}M_{n}\|>\lambda\right)<4\exp\left(-\frac{\lambda^{2}/2}% {v+L\lambda/3}\right)

(74)

with

\displaystyle v=\|\sum_{n\in[N]}E[M_{n}M^{\dagger}_{n}]\|=Nr(1-r),\quad L=\max% _{n}\|M_{n}\|=1-r.

(75)

Choose $\lambda=Nr/4$ , we obtain

P(\|\sum_{n\in[N]}M_{n}\|>Nr/4)<4\exp\left(-\frac{3}{104}\frac{Nr}{1-r}\right).

(76)

We want the $P(|\Omega|\leq Nr/2)+P(|p_{k}|\cdot|\Omega|>Nr/4)$ to be smaller than $\delta/N$ . This condition leads to

	$\displaystyle\exp\left(-\frac{3}{104}\frac{Nr}{1-r}\right)<\frac{\delta}{5N},$		(77)
	$\displaystyle r>\frac{104}{3N}\ln\left(\frac{5N}{\delta}\right)/\left[1+\frac{% 104}{3N}\ln\left(\frac{5N}{\delta}\right)\right].$		(78)

Therefore, if we choose

r>\frac{35}{N}\ln\left(\frac{5N}{\delta}\right),

(79)

then $p$ can serve as a dual certificate with probability at least $1-\delta$ .

∎

Appendix B Proof of Lemma 2

Proof.

In this problem, we have

y_{\Omega}=y^{0}_{\Omega}+z_{\Omega},\quad y^{0}=F^{\dagger}x,\quad x_{k}=% \delta_{k,\chi}.

(80)

According to Lemma 1, with high probability the dual certificate $p=F^{\dagger}_{\Omega}y^{0}_{\Omega}/|\Omega|$ satisfies

p_{\chi}=1;\quad|p_{k}|\leq\frac{1}{2},\ \forall k\neq\chi.

(81)

Calculate the inner product to obtain

$\displaystyle\langle s^{\#},p\rangle$	$\displaystyle=\frac{1}{\|\Omega\|}\langle s^{\#},F^{\dagger}_{\Omega}y^{0}_{% \Omega}\rangle$	(82)
	$\displaystyle=\frac{1}{\|\Omega\|}\langle F_{\Omega}s^{\#},y^{0}_{\Omega}\rangle$
	$\displaystyle=\frac{1}{\|\Omega\|}\langle y_{\Omega}+\tilde{z}_{\Omega},y^{0}_{% \Omega}\rangle$
	$\displaystyle=\frac{1}{\|\Omega\|}\langle y^{0}_{\Omega}+z_{\Omega}+\tilde{z}_{% \Omega},y^{0}_{\Omega}\rangle$
	$\displaystyle=1+\frac{1}{\|\Omega\|}\langle z_{\Omega}+\tilde{z}_{\Omega},y^{0}_% {\Omega}\rangle.$

Here $\tilde{z}_{\Omega}$ comes from the fact that $F_{\Omega}s^{\#}-y_{\Omega}$ can have a small deviation that satisfies $\|\tilde{z}_{\Omega}\|_{2}\leq\sqrt{|\Omega|}\eta$ . Furthermore,

\frac{1}{|\Omega|}|\langle z_{\Omega},y^{0}_{\Omega}\rangle|\leq\frac{1}{|% \Omega|}\sum_{n\in\Omega}|z_{n}|\leq\frac{1}{\sqrt{|\Omega|}}\sqrt{\sum_{n\in% \Omega}|z_{n}|^{2}}\leq\eta.

(83)

Thus,

1-2\eta\leq|\langle s^{\#},p\rangle|\leq 1+2\eta.

(84)

On the other hand, $x$ is a feasible solution, hence $\|s^{\#}\|_{1}\leq\|x\|_{1}=1$ , and

|\langle s^{\#},p\rangle|\leq|s^{\#}_{\chi}|+\frac{1}{2}\sum_{k\neq\chi}|s^{\#% }_{k}|\leq 1-\frac{1}{2}\sum_{k\neq\chi}|s^{\#}_{k}|.

(85)

Combining them together, we obtain

\sum_{k\neq\chi}|s^{\#}_{k}|\leq 4\eta,\quad|s^{\#}_{\chi}|\geq 1-2\eta-\frac{% 1}{2}\sum_{k\neq\chi}|s^{\#}_{k}|\geq 1-4\eta.

(86)

∎

Appendix C Concentration of $X_{n}$

Here we define

|k|_{\text{mod }N}=\min_{z\in\mathbb{Z}}|k-zN|,

(87)

then we can prove

Lemma 3.

For $N\geq 100$ , consider the summation of random variables

X_{n}=\sin^{2}(n\pi k/N)1_{n}

(88)

with $k$ being a parameter in region $(-N,N)$ and $1_{n}$ is defined in Eq. (18). For each $|k|_{\text{mod }N}>1/2$ , with probability at least $1-\exp(-Nr/36)$ , we have

\sum_{n=1}^{N}X_{n}>\frac{5Nr}{24}.

(89)

With probability at least $1-2\exp(-Nr/24)$ , we have for all $|k|_{\text{mod }N}\leq 1/2$ ,

\sum_{n=1}^{N}X_{n}\in\left[\frac{2k^{2}}{3}Nr,\frac{1.01^{2}\pi^{2}k^{2}}{2}% Nr\right].

(90)

Proof.

Start with the expectation value of $S_{n}=\sum_{n\in[N]}X_{n}$ :

$\displaystyle E[S_{n}]$	$\displaystyle=r\sum_{n=1}^{N}\sin^{2}(n\pi k/N)$	(91)
	$\displaystyle=r\sum_{n=1}^{N}\frac{2-e^{i2n\pi k/N}-e^{-i2n\pi k/N}}{4}$	(92)
	$\displaystyle=\frac{Nr}{2}-\frac{r}{4}\sum_{n=1}^{N}(e^{i2n\pi k/N}+e^{-i2n\pi k% /N})$	(93)
	$\displaystyle=\frac{Nr}{2}-\frac{r}{2}\cos\left(\frac{\pi k(N+1)}{N}\right)% \frac{\sin(\pi k)}{\sin(\pi k/N)}.$	(94)

We verify that when $|k|_{\text{mod }N}\in(1/2,1]$ , $\cos(\pi k(N+1)/N)<0,\sin(\pi k)/\sin(\pi k/N)\geq 0$ , thus $E[S_{n}]\geq Nr/2$ ; when $|k|_{\text{mod }N}>1$ , we use

{E[S_{n}]>\frac{Nr}{2}-\frac{Nr}{2}\left|\frac{\sin(\pi k)}{N\sin(\pi k/N)}% \right|.}

(95)

Notice that $\sin(\pi k)/(N\sin(\pi k/N))$ is the Dirichlet kernel, which is close to a delta function. With $N>100$ , one can verify that for $|k|_{\text{mod }N}>1$ , because the second peak of the Dirichlet kernel is in region $(1,2)$ , we have

	$\displaystyle\left\|\frac{\sin(\pi k)}{N\sin(\pi k/N)}\right\|<\frac{1}{N\sin(2% \pi/N)}<\frac{1}{6},$		(96)
	$\displaystyle E[S_{n}]>\frac{5}{12}Nr.$		(97)

Therefore, when $|k|_{\text{mod }N}>1/2$ , we have $E[S_{n}]>5Nr/12$ , and $X_{n}\in[0,1]$ for all $n$ . By Bernstein’s inequality,

P\left[S_{n}-E[S_{n}]<-E[S_{n}]/2\right]<\exp\left(-\frac{E[S_{n}]^{2}/8}{r% \sum\sin^{4}(\pi nk/N)+E[S_{n}]/6}\right).

(98)

After calculation, $\sum\sin^{4}(\pi nk/N)$ is close to $3N/8$ . Thus, with probability at least

1-\exp(-Nr/36),

(99)

we have

\sum_{n\in[N]}X_{n}=S_{n}>\frac{1}{2}E[S_{n}]>\frac{5Nr}{24}.

(100)

This is the first conclusion. In the other case where $|k|_{\text{mod }N}\leq 1/2$ , the $\sin$ function satisfies

1\geq\left|\frac{\sin(\pi k)}{\pi k}\right|\geq\frac{2}{\pi},

(101)

therefore we have

\frac{4k^{2}}{N^{2}}\sum_{n\in[N]}n^{2}1_{n}\leq\sum_{n\in[N]}X_{n}\leq\frac{% \pi^{2}k^{2}}{N^{2}}\sum_{n\in[N]}n^{2}1_{n}

(102)

Similarly, using Bernstein’s inequality, we obtain

P\left[\left|\sum n^{2}1_{n}-r\sum n^{2}\right|>\frac{r}{2}\sum n^{2}\right]% \leq 2\exp\left(-\frac{(r\sum n^{2})^{2}/8}{r\sum n^{4}+N^{2}r\sum n^{2}/6}% \right).

(103)

Because $\sum n^{2}=(N+1)(2N+1)/6\approx N^{3}/3,\sum_{n}n^{4}\approx N^{5}/5$ , with probability at least $1-2\exp(-Nr/24)$ , we have

	$\displaystyle\frac{Nr}{2}E[n^{2}]\leq\sum_{n\in\Omega}n^{2}\leq\frac{3Nr}{2}E[% n^{2}],$		(104)
	$\displaystyle\frac{2k^{2}r}{N}E[n^{2}]\leq\sum_{n\in\Omega}X_{n}\leq\frac{3\pi% ^{2}k^{2}r}{2N}E[n^{2}].$		(105)

After simplification, the second conclusion is obtained. ∎

Appendix D Off-grid compressed sensing

In this section we discuss an off-grid compressed sensing algorithm for the multiple eigenvalue estimation. Although the algorithm we will be describing works well numerically, the rigorous proof of its Heisenberg limit is hard, the full investigation of which will be left for future work.

In regular compressed sensing, the optimization algorithm Eq. (21) can be rewritten in the following concise form:

\min\|Fr\|_{1},\quad s.t.\quad r_{\Omega}=y_{\Omega},

(106)

where $r$ is the time-domain ansatz of the signal instead of the frequency-domain ansatz. The idea of the off-grid compressed sensing is similar, except we need another vector norm for the continuous values of frequencies [35]:

	$\displaystyle\min\\|r\\|_{\mathcal{A}},\quad s.t.\quad r_{\Omega}=y_{\Omega},$		(107)
	$\displaystyle\\|r\\|_{\mathcal{A}}=\inf_{c_{f}\geq 0,\phi_{f}\in[0,2\pi),f\in[0,% 1)}\left\{\sum_{f}c_{f}:r_{n}=\sum_{f}c_{f}e^{i(2\pi fn+\phi_{f})}\right\}.$		(108)

The new vector norm $\|\cdot\|_{\mathcal{A}}$ is called the atomic norm, which can be written as the solution of an SDP (see Proposition 2.1 of [35]):

\|r\|_{\mathcal{A}}=\inf_{u,v}\left\{\frac{1}{2N}\text{tr}(\text{Toep}(u))+% \frac{v}{2}:\begin{bmatrix}\text{Toep}(u)&r\\ r^{\dagger}&v\end{bmatrix}\succeq 0\right\},

(109)

where $u$ is an $(2N+1)$ -dimensional vector (in the off-grid case, instead of choosing $n$ from $[N]$ , we allow it to take values from $-N$ to $N$ ), $v$ is a real number, and $\text{Toep}(u)$ represents a Toeplitz matrix whose first column is $u$ :

(\text{Toep}(u))_{mn}=u_{m-n},\quad u_{-n}=u_{n}^{\dagger}.

(110)

Thus, the full optimization task for the off-grid compressed sensing is

\min_{u,r,v}\frac{1}{2N}\text{tr}(\text{Toep}(u))+\frac{v}{2},\quad s.t.\quad% \begin{bmatrix}\text{Toep}(u)&r\\ r^{\dagger}&v\end{bmatrix}\succeq 0,\quad r_{\Omega}=y_{\Omega}.

(111)

Suppose the error tolerance for each signal is $\eta$ , then the robust version that allows small noise is given as

\min_{u,r,v}\frac{1}{2}\|r_{\Omega}-y_{\Omega}\|_{2}^{2}+\lambda\left(\frac{1}% {2N}\text{tr}(\text{Toep}(u))+\frac{v}{2}\right),\quad s.t.\quad\begin{bmatrix% }\text{Toep}(u)&r\\ r^{\dagger}&v\end{bmatrix}\succeq 0,

(112)

with $\lambda=\Theta(\eta\sqrt{|\Omega|/|\mathcal{F}|})$ . This is the so-called Beurling-Lasso algorithm [37].

The next theorem shows the effciency of the algorithm, which is simply a rewriting of the results in [38] with our notations.

Theorem 2.

Suppose we have the signal

y_{n}=y^{0}_{n}+z_{n},\quad y^{0}_{n}=\sum_{f\in\mathcal{F}}c_{f}e^{i2\pi fn},% \quad n=-N,-N+1,\cdots,N,

(113)

where $c_{f}$ are positive numbers with summation 1, $f\in[0,1)$ . We know the values of $y_{n}$ on the set $\Omega$ , a set of integers sampled from $-N$ to $N$ . The solution to Eq. (112) $r^{\#}$ can be decomposed as

r^{\#}_{n}=\sum_{g\in\mathcal{G}}d_{g}e^{i2\pi gn}.

(114)

To quantify the similarity between $\mathcal{G}$ and $\mathcal{F}$ , define

	$\displaystyle\Delta_{\mathcal{F}}=\min_{m\in\mathbb{Z}}\min_{f_{1}\neq f_{2}% \in\mathcal{F}}\|f_{1}-f_{2}+m\|,$		(115)
	$\displaystyle R^{\text{near}}_{f}=\left\{k:\frac{\pi^{2}}{3}N(N+4)(k-f)^{2}% \leq\frac{1}{128}\right\},$		(116)
	$\displaystyle R^{\text{far}}=[0,1)\char 92\bigcup_{f\in\mathcal{F}}R^{\text{% near}}_{f}.$		(117)

Then the estimator for $f$ is

\hat{f}=\text{average value of }\{g:g\in R^{\text{near}}_{f}\},

(118)

and the estimator for $c_{f}$ is

\hat{c}_{f}=\sum_{g\in R^{\text{near}}_{f}}d_{g}.

(119)

If $N\geq 128,\,\|z_{\Omega}\|_{2}\leq\sqrt{|\Omega|}\eta,\,\Delta_{\mathcal{F}}>% \frac{2|\mathcal{F}|^{1/4}}{\sqrt{N(N+4)}},\,\lambda=\Theta(\eta\sqrt{|\Omega|% /|\mathcal{F}|})$ , and

|\Omega|\geq\mathcal{O}\left(\max\left\{|\mathcal{F}|\log\frac{N}{\delta},|% \mathcal{F}|\log^{2}\frac{|\mathcal{F}|}{\delta}\right\}\right),

(120)

introduce a constant $\varepsilon_{0}\approx 0.000504$ , then we have

	$\displaystyle\|\hat{c}_{f}-c_{f}\|\leq 8\sqrt{\|\Omega\|}\eta+(8\lambda+\sqrt{\|% \Omega\|}\eta)\sqrt{\|\mathcal{F}\|}+4\lambda+\frac{1-\varepsilon_{0}}{% \varepsilon_{0}}\frac{(\sqrt{\|\Omega\|}\eta+\lambda\sqrt{\|\mathcal{F}\|})^{2}}{2% \lambda},$		(121)
	$\displaystyle\|\hat{f}-f\|_{\text{mod }1}\leq\frac{1}{8\sqrt{6}N},\quad\sum_{g% \in R^{\text{far}}}\|d_{g}\|\leq\frac{(\sqrt{\|\Omega\|}\eta+\lambda\sqrt{\|% \mathcal{F}\|})^{2}}{2\varepsilon_{0}\lambda}$		(122)

with probability at least $1-\delta$ .

The more detailed discussions can be found in Proposition 1, Proposition 2, and Appendix C of [38]. Roughly speaking, similarly to the regular compressed sensing, $\mathcal{O}(\log N)$ random samples are enough to approximately recover the noisy signal. But the output of the off-grid compressed sensing $r^{\#}$ is an estimate of the time-domain signal, not the frequency-domain signal. We need an extra step to recover the frequency support. In signal recovery, when the minimal frequency gap of the signal has lower bound $N^{-1}$ , we say the frequency support is well-separated and can be recovered easily. However, we do not have any conclusion about the minimal frequency gap of the output signal $r^{\#}$ , thus the original method in [35] that depends on the reconstruction of dual certificate is not guaranteed to work. Besides, in practice the performance of the original algorithm is very sensitive to noise, which makes it impractical. We need a different post-processing procedure to recover an estimation of $\mathcal{F}$ from $r^{\#}$ .

The multiple signal classification (MUSIC) algorithm [21] is one strong candidate for this. The algorithm is constructed on the Hankel matrix that is defined as (when $t$ takes value from $-N$ to $N$ )

(\text{Hank}(y))_{mn}=y_{m+n-N-1},\quad m,n\in[N].

(123)

For a noisy signal $y_{n}=y^{0}_{n}+z_{n}$ , we can further separate the Hankel matrix into two parts:

\text{Hank}(y)=\text{Hank}(y^{0})+\text{Hank}(z).

(124)

The Hankel matrix has the following decomposition:

	$\displaystyle\text{Hank}(y^{0})$	$\displaystyle=\sum_{f\in\mathcal{F}}c_{f}e^{-i2\pi(N-1)f}a(f)^{\top}a(f),$		(125)
	$\displaystyle a(f)$	$\displaystyle=[e^{-i2\pi f}\ e^{-i2\pi 2f}\ \cdots\ e^{-i2\pi Nf}]^{\top}.$		(126)

Therefore, the set of $\{a(f),f\in\mathcal{F}\}$ can be found in the space spanned by the vectors with the $|\mathcal{F}|$ largest singular values of $\text{Hank}(y^{0})$ . Define the projector onto the null space of $\text{Hank}(y^{0})$ as $P_{\text{null}}$ . Define the noise-space correlation function as

C_{0}(k)=\frac{\|P_{\text{null}}a(k)\|_{2}}{\|a(k)\|_{2}}.

(127)

It turns out $C_{0}(k)=0$ if and only if $k\in\mathcal{F}$ (see Theorem 1 of [21]). Similarly, we can define $C(k)$ as the noise-space correlation function for $y$ , and as long as $\text{Hank}(z)$ does not perturb the null space too much, the local minima of $C(k)$ can serve as estimator for $\mathcal{F}$ .

Combining the MUSIC algorithm with the off-grid compressed sensing we obtain Algorithm 2 for the QEEP. A numerical test that demonstrates the efficiency of the algorithm is shown in Fig. 5.

Algorithm 2 QEEP

1: accuracy level

N

, time step

\tau

, size of frequency support

|\mathcal{F}|

, expected failure probability

\delta_{1},\delta_{2}

, Hamiltonian

H

, and an initial state

|\Phi\rangle

2: the

|\mathcal{F}|

smallest local minima of

C(k)

3:Sample

\mathcal{O}\left(\max\left\{|\mathcal{F}|\log(N/\delta_{1}),|\mathcal{F}|\log^% {2}(|\mathcal{F}|/\delta_{1})\right\}\right)

integers from

[-N,N]

\Omega

4:for

n\in\Omega

5: Prepare the initial state

|\Phi\rangle

and the unitary operator

e^{-iHn\tau}

;

6: Perform Hadamard tests on

e^{-iHn\tau}|\Phi\rangle

for

\mathcal{O}(\log(|\Omega|\delta_{2}^{-1}))

times;

7: Calculate the average value of the test outcomes as signal

y_{t}

8:end for

9:Solve

\min_{u,r,v}\frac{1}{2}\|r_{\Omega}-y_{\Omega}\|_{2}^{2}+\lambda\left(\frac{1}% {2N}\text{tr}(\text{Toep}(u))+\frac{v}{2}\right),\quad s.t.\quad\begin{bmatrix% }\text{Toep}(u)&r\\ r^{\dagger}&v\end{bmatrix}\succeq 0

to obtain solution

u^{\#},r^{\#},v^{\#}

10:Perform singular value decomposition on

\text{Hank}(r^{\#})

\text{Hank}(r^{\#})=[u_{1},u_{2},\cdots,u_{|\mathcal{F}|}\cdots]\text{diag}\{% \sigma_{1},\sigma_{2},\cdots,\sigma_{|\mathcal{F}|},\cdots\}[v_{1},v_{2},% \cdots,v_{|\mathcal{F}|}\cdots]^{\dagger},

where

\{\sigma_{1},\sigma_{2},\cdots,\sigma_{|\mathcal{F}|}\}

are the

|\mathcal{F}|

largest singular values.

11:Find the projector onto

\text{span}\{u_{1},u_{2},\cdots,u_{|\mathcal{F}|}\}

P_{1}

12:Compute the noise-space correlation function:

C(k)=\|(I-P_{1})a(k)\|_{2}/\|a(k)\|_{2}

$\displaystyle\langle s^{\#},p\rangle$	$\displaystyle=\frac{1}{\|\Omega\|}\langle s^{\#},F^{\dagger}_{\Omega}y^{0}_{% \Omega}\rangle$	(82)
	$\displaystyle=\frac{1}{\|\Omega\|}\langle F_{\Omega}s^{\#},y^{0}_{\Omega}\rangle$
	$\displaystyle=\frac{1}{\|\Omega\|}\langle y_{\Omega}+\tilde{z}_{\Omega},y^{0}_{% \Omega}\rangle$
	$\displaystyle=\frac{1}{\|\Omega\|}\langle y^{0}_{\Omega}+z_{\Omega}+\tilde{z}_{% \Omega},y^{0}_{\Omega}\rangle$
	$\displaystyle=1+\frac{1}{\|\Omega\|}\langle z_{\Omega}+\tilde{z}_{\Omega},y^{0}_% {\Omega}\rangle.$

Quantum phase estimation by compressed sensing

Abstract

1 Introduction

2 Preliminaries

2.1 QEEP as a signal recovery problem

2.2 Compressed sensing for signal recovery

3 Single eigenvalue estimation

Lemma 1.

Proof.

Lemma 2.

Proof.

Proposition 1.

Proof.

Theorem 1.

Proof.

Remark.

4 Comparison to previous works

5 Discussion

6 Acknowledgement

References

Appendix A Proof of Lemma 1

Proof.

Appendix B Proof of Lemma 2

Proof.

Appendix C Concentration of Xnsubscript𝑋𝑛X_{n}italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT

Lemma 3.

Proof.

Appendix D Off-grid compressed sensing

Theorem 2.

Appendix C Concentration of $X_{n}$