License: CC BY 4.0
arXiv:2312.00733v1 [quant-ph] 01 Dec 2023

Provable bounds for noise-free expectation values computed from noisy samples

Samantha V. Barron IBM Quantum, IBM T.J. Watson Research Center    Daniel J. Egger IBM Quantum, IBM Research Europe - Zurich    Elijah Pelofske Los Alamos National Laboratory    Andreas Bärtschi Los Alamos National Laboratory    Stephan Eidenbenz Los Alamos National Laboratory    Matthis Lehmkuehler University of Basel    Stefan Woerner [email protected] IBM Quantum, IBM Research Europe - Zurich
(December 1, 2023)
Abstract

In this paper, we explore the impact of noise on quantum computing, particularly focusing on the challenges when sampling bit strings from noisy quantum computers as well as the implications for optimization and machine learning applications. We formally quantify the sampling overhead to extract good samples from noisy quantum computers and relate it to the layer fidelity, a metric to determine the performance of noisy quantum processors. Further, we show how this allows us to use the Conditional Value at Risk of noisy samples to determine provable bounds on noise-free expectation values. We discuss how to leverage these bounds for different algorithms and demonstrate our findings through experiments on a real quantum computer involving up to 127 qubits. The results show a strong alignment with theoretical predictions.

I Introduction

Quantum computing is a new computational paradigm which promises to impact many disciplines, ranging from quantum chemistry peruzzo_2014_vqe ; ollitrault_2021_dynamics , quantum physics dimeglio2023quantum , and material sciences barkoutsos_2021_alchemical , to machine learning Havlicek2019 ; Zoufal_2019_qgan ; Zoufal_2021_varqbm , optimization farhi_2014_qaoa ; Bravyi2019 ; egger2021warm ; Sack2023 , and finance Woerner_2019_risk ; yndurain_2019_quantum_finance ; Stamatopoulos_2022_market_risk . However, leveraging near-term quantum computers is difficult due to the noise present in the systems. Ultimately, this needs to be addressed by quantum error correction, which exponentially suppresses errors by encoding logical qubits in multiple physical qubits nielsen_and_chuang ; lidar_brun_2013_qec .

In near-term devices, implementing error correction is infeasible. We must find other ways to handle the noise. A promising approach to bridge the gap between noisy and error-corrected quantum computing is error mitigation. Here, we leverage multiple noisy estimates to construct a better approximation of the noise-free result. The most prominent examples are Probabilistic Error Cancellation (PEC) berg2023probabilistic ; Piveteau_2022 and Zero Noise Extrapolation (ZNE) Temme_2017 . While error mitigation in general scales exponentially quek2023exponentially , a combination of PEC and ZNE has been impressively demonstrated recently in a 127-qubit experiment at a circuit depth beyond the reach of exact classical methods kim_2023_utility ; anand2023classical . The rate of the exponential cost of error mitigation directly relates to the errors in the quantum devices. It is expected that these errors can be reduced to a level that noisy devices with error mitigation can already perform practically relevant tasks even before error correction Bravyi_2022 . PEC and ZNE mitigate the errors in expectation values. While this finds many applications, e.g., in quantum chemistry and physics, most quantum optimization farhi_2014_qaoa ; egger2021warm ; zoufal_2023_blackbox and many quantum machine learning algorithms Zoufal_2019_qgan ; letcher2023tight build directly on top of measured samples from a quantum computer. In optimization, having access to an objective value but not the samples corresponds to knowing the value of an optimal solution but not how to realize it. Getting these samples is thus a key problem to scale sample-based algorithms on noisy hardware.

In this paper, we discuss the impact of noise on sampling bit strings from a noisy quantum computer and quantify the sampling overhead required to extract good solutions from noisy devices, e.g., in the context of optimization. Furthermore, we connect our findings to the Conditional Value at Risk (CVaR, also known as Expected Shortfall), an alternative loss function introduced in Ref. barkoutsos_2020_cvar . We show that CVaR is robust against noise and can generate meaningful results from noisy samples also for expectation values. This feature was already conjectured in Ref. barkoutsos_2020_cvar but has not been shown formally. Our work closes this gap and shows that CVaR evaluated on noisy samples achieves provable bounds on noise-free observables. We demonstrate these bounds on up to 127-qubits on a real quantum computer applied to optimization problems, where we find close agreement between the experiments and theory. In particular, this allows us to apply the known noise-free performance bounds for the Quantum Approximate Optimization Algorithm (QAOA) for MAXCUT on 3-regular graphs farhi_2014_qaoa ; wurtz_2021_qaoa . Thus, our work thus results in provable performance guarantees for a variational algorithm even on noisy hardware.

The remainder of this paper is organized as follows. First, Sec. II discusses the impact of noise on sampling and how to quantify it. Then, Sec. III formally defines the CVaR and shows that it can provide provable bounds to noise-free expectation values from noisy samples. Afterwards, Sec. IV discuses the implications of the presented results in the context of applications in optimization, machine learning, and quantum time evolution. Sec. V demonstrates the results on a real quantum computer up to 127-qubits where we find close agreement with the theory. Last, Sec. VI concludes the paper and we discuss open questions for further research.

II Sampling from Noisy Quantum Computers

Suppose an initial n𝑛nitalic_n-qubit quantum state ρ0subscript𝜌0\rho_{0}italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, a quantum operation 𝒰()=UU𝒰𝑈superscript𝑈\mathcal{U}(\cdot)=U\cdot U^{\dagger}caligraphic_U ( ⋅ ) = italic_U ⋅ italic_U start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT, and the resulting ρ=𝒰(ρ0)𝜌𝒰subscript𝜌0\rho=\mathcal{U}(\rho_{0})italic_ρ = caligraphic_U ( italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). On a real quantum computer, we usually do not have access to the ideal operation 𝒰𝒰\mathcal{U}caligraphic_U but only to a noisy version 𝒰~~𝒰\widetilde{\mathcal{U}}over~ start_ARG caligraphic_U end_ARG which we model by 𝒰Λ𝒰Λ\mathcal{U}\circ\Lambdacaligraphic_U ∘ roman_Λ. Here, ΛΛ\Lambdaroman_Λ denotes the noise model. We denote the resulting noisy state by ρ~=𝒰~(ρ0)~𝜌~𝒰subscript𝜌0\widetilde{\rho}=\widetilde{\mathcal{U}}(\rho_{0})over~ start_ARG italic_ρ end_ARG = over~ start_ARG caligraphic_U end_ARG ( italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ).

For simplicity, we assume the Pauli-Lindblad noise model introduced in Ref. berg2023probabilistic

Λ(ρ)=k𝒦(wk()+(1wk)Pk()Pk)ρ.Λ𝜌subscriptproduct𝑘𝒦subscript𝑤𝑘1subscript𝑤𝑘subscript𝑃𝑘subscript𝑃𝑘𝜌\displaystyle\Lambda(\rho)=\prod_{k\in\mathcal{K}}\left(w_{k}\,(\cdot)+(1-w_{k% })P_{k}(\cdot)P_{k}\right)\rho.roman_Λ ( italic_ρ ) = ∏ start_POSTSUBSCRIPT italic_k ∈ caligraphic_K end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) + ( 1 - italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ ) italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_ρ . (1)

Here, 𝒦𝒦\mathcal{K}caligraphic_K denotes the index set for (local) Pauli error terms Pksubscript𝑃𝑘P_{k}italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and wk=(1+e2λk)/2subscript𝑤𝑘1superscript𝑒2subscript𝜆𝑘2w_{k}=(1+e^{-2\lambda_{k}})/2italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ( 1 + italic_e start_POSTSUPERSCRIPT - 2 italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) / 2 for corresponding model coefficients λksubscript𝜆𝑘\lambda_{k}italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT that determine the strength of the noise. The assumption of Pauli noise can usually be justified via Pauli twirling knill_randomized_2008 ; dankert_exact_2009 ; magesan_scalable_2011 . In Appendix A we discuss Pauli twirling and the assumption of a Pauli noise model in more detail.

In general, a quantum circuit is not a single operation 𝒰𝒰\mathcal{U}caligraphic_U but a concatenation of layers 𝒰isubscript𝒰𝑖\mathcal{U}_{i}caligraphic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=1,,l𝑖1𝑙i=1,\ldots,litalic_i = 1 , … , italic_l. Their noisy versions are 𝒰~isubscript~𝒰𝑖\widetilde{\mathcal{U}}_{i}over~ start_ARG caligraphic_U end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with corresponding noise models ΛisubscriptΛ𝑖\Lambda_{i}roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Crucially, this allows us to learn the noise model for each layer independently berg2023probabilistic . A common assumption is that the layers 𝒰isubscript𝒰𝑖\mathcal{U}_{i}caligraphic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT consist of non-overlap** CNOT gates (or other hardware-native two-qubit Clifford gates) and that these layers are possibly alternating with layers of single qubit gates. Single qubit gates are assumed to be noise-free since their errors are an order of magnitude smaller than those of two-qubit gates. Therefore, only the noise of the two-qubit gate layers is considered.

Assuming the above layer structure and that the noise model of the quantum processor is sparse allows Ref. berg2023probabilistic to introduce a protocol to efficiently learn the model coefficients λksubscript𝜆𝑘\lambda_{k}italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. A property of ΛΛ\Lambdaroman_Λ that characterizes the overall strength of the noise is γ=e2kλk𝛾superscript𝑒2subscript𝑘subscript𝜆𝑘\gamma=e^{2\sum_{k}\lambda_{k}}italic_γ = italic_e start_POSTSUPERSCRIPT 2 ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. This has a direct operational interpretation, since γ2superscript𝛾2\gamma^{2}italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT defines the sampling overhead of applying PEC to mitigate the noise in the context of estimating an expectation value Temme_2017 ; berg2023probabilistic .

Here, we first focus on sampling from noisy quantum computers instead of estimating expectation values. Suppose we prepare a quantum state and afterwards measure the qubits. Then, the probability to sample a bit string x{0,1}n𝑥superscript01𝑛x\in\{0,1\}^{n}italic_x ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is given by px=tr(ρ|xx|)subscript𝑝𝑥tr𝜌ket𝑥bra𝑥p_{x}=\operatorname{tr}(\rho|x\rangle\!\langle x|)italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = roman_tr ( italic_ρ | italic_x ⟩ ⟨ italic_x | ) for the noise-free state ρ𝜌\rhoitalic_ρ and by p~x=tr(ρ~|xx|)subscript~𝑝𝑥tr~𝜌ket𝑥bra𝑥\widetilde{p}_{x}=\operatorname{tr}(\widetilde{\rho}|x\rangle\!\langle x|)over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = roman_tr ( over~ start_ARG italic_ρ end_ARG | italic_x ⟩ ⟨ italic_x | ) for the noisy state ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG. The noise model introduced in Eq. (1) can also be interpreted as follows: with a probability of 1/γ=kwk1𝛾subscriptproduct𝑘subscript𝑤𝑘1/\sqrt{\gamma}=\prod_{k}w_{k}1 / square-root start_ARG italic_γ end_ARG = ∏ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT we sample a bit-string from ρ𝜌\rhoitalic_ρ and with probability 11/γ11𝛾1-1/\sqrt{\gamma}1 - 1 / square-root start_ARG italic_γ end_ARG we sample from a state where at least one error occurred. Here, we assume λk1much-less-thansubscript𝜆𝑘1\lambda_{k}\ll 1italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≪ 1 such that we can leverage ex=1+x+𝒪(x2)superscript𝑒𝑥1𝑥𝒪superscript𝑥2e^{x}=1+x+\mathcal{O}(x^{2})italic_e start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT = 1 + italic_x + caligraphic_O ( italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). It immediately follows that wk=eλk+𝒪(λk2)subscript𝑤𝑘superscript𝑒subscript𝜆𝑘𝒪superscriptsubscript𝜆𝑘2w_{k}=e^{-\lambda_{k}}+\mathcal{O}(\lambda_{k}^{2})italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_e start_POSTSUPERSCRIPT - italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + caligraphic_O ( italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), and thus, 1/γ=kwk1𝛾subscriptproduct𝑘subscript𝑤𝑘1/\sqrt{\gamma}=\prod_{k}w_{k}1 / square-root start_ARG italic_γ end_ARG = ∏ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Then, the law of total probability kokosaka_2000_probability implies the lower bound:

p~xsubscript~𝑝𝑥\displaystyle\widetilde{p}_{x}over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT \displaystyle\geq px/γ.subscript𝑝𝑥𝛾\displaystyle p_{x}/\sqrt{\gamma}.italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT / square-root start_ARG italic_γ end_ARG . (2)

In other words, if a noise-free state ρ𝜌\rhoitalic_ρ has probability pxsubscript𝑝𝑥p_{x}italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT to sample a bit string of interest x𝑥xitalic_x, then, if ρ𝜌\rhoitalic_ρ is approximated by ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG prepared through a noisy process characterized by γ𝛾\gammaitalic_γ, we need a multiplicative sampling overhead of γ𝛾\sqrt{\gamma}square-root start_ARG italic_γ end_ARG to guarantee at least the same probability of sampling x𝑥xitalic_x as for the noise-free state. Thus, as long as we are only interested in generating relevant bit strings that we can efficiently evaluate classically, we can deal with the noise by measuring γ𝛾\sqrt{\gamma}square-root start_ARG italic_γ end_ARG-times more often. This is in contrast to the multiplicative sampling overhead γ2superscript𝛾2\gamma^{2}italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT introduced by PEC when we are interested in estimating expectation values. Interestingly, if we apply PEC and then determine only the sampling probabilities, without evaluating an expectation value, we find that the sampling probabilities are lower bounded by px/γsubscript𝑝𝑥𝛾p_{x}/\gammaitalic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT / italic_γ, i.e., PEC “amplifies” the noise to achieve an unbiased estimation of expectation values, see Appendix B for more details.

The sampling overhead γ𝛾\sqrt{\gamma}square-root start_ARG italic_γ end_ARG can be derived from the noise model resulting from the noise learning protocol introduced in Ref. berg2023probabilistic . However, in the present context, we are not interested in the full description of the noise model, only in γ𝛾\gammaitalic_γ. Recently, Ref. mckay2023benchmarking introduced the Layer Fidelity (LF), a metric to measure noise present in the hardware when executing a circuit. The LF also assumes the layered gate structure mentioned above and determines the resulting fidelity for each layer of gates. It has a direct connection to the sampling overhead via LFi=1/γisubscriptLF𝑖1subscript𝛾𝑖\text{LF}_{i}=1/\sqrt{\gamma_{i}}LF start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 / square-root start_ARG italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG, where γisubscript𝛾𝑖\gamma_{i}italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT characterizes the noise of layer i𝑖iitalic_i. For multiple layers we can thus rewrite Eq. (2) as

p~xsubscript~𝑝𝑥\displaystyle\widetilde{p}_{x}over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT \displaystyle\geq pxiLFi.subscript𝑝𝑥subscriptproduct𝑖subscriptLF𝑖\displaystyle p_{x}\prod_{i}\text{LF}_{i}.italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT LF start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (3)

Further, the LF has the advantage that it is very cheap to evaluate compared to learning to full noise model. Thus, for a given circuit, the LF allows us to efficiently determine the sampling overhead to compensate the noise.

Other types of errors that we have not mentioned so far are state preparation and measurement (SPAM) errors. In principle, we can also determine a sampling overhead and compensate for the SPAM errors by increasing the number of samples. However, particularly for measurement errors, there exists other protocols which might allow for statistical corrections with a smaller sampling overhead van_den_Berg_2022_trex ; Nation_2021_m3 . A systematic study of these types of errors would be interesting for future research.

III Conditional Value-at-Risk

Section II shows that we can sample bit strings of interest x𝑥xitalic_x, i.e., corresponding to the noise-free state ρ𝜌\rhoitalic_ρ, by taking γ𝛾\sqrt{\gamma}square-root start_ARG italic_γ end_ARG-times more samples from the noisy state ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG. However, we usually do not know which samples correspond to the noise-free state and which samples were affected by noise. We now leverage the insight of Sec. II and show that the CVaR can provide provable bounds to noise-free expectation values from noisy samples. The CVaR has already been suggested as a loss function and observable in Ref. barkoutsos_2020_cvar , however, only based on intuition and without theoretical justification.

Consider an integrable real-valued random variable X𝑋Xitalic_X with cumulative distribution function FX:[0,1]:subscript𝐹𝑋01F_{X}:\mathbb{R}\rightarrow[0,1]italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT : blackboard_R → [ 0 , 1 ]. Then, the (lower) CVaR at level α(0,1]𝛼01\alpha\in(0,1]italic_α ∈ ( 0 , 1 ] is defined as

CVaRα(X)subscriptCVaR𝛼𝑋\displaystyle\operatorname{CVaR}_{\alpha}(X)roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_X ) =α1𝔼[X;Xxα]absentsuperscript𝛼1𝔼delimited-[]𝑋𝑋subscript𝑥𝛼\displaystyle=\alpha^{-1}\mathbb{E}[X;X\leq x_{\alpha}]= italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT blackboard_E [ italic_X ; italic_X ≤ italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ]
+xα(1α1[Xxα]),subscript𝑥𝛼1superscript𝛼1delimited-[]𝑋subscript𝑥𝛼\displaystyle\qquad+x_{\alpha}(1-\alpha^{-1}\mathbb{P}[X\leq x_{\alpha}])\,,+ italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( 1 - italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT blackboard_P [ italic_X ≤ italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ] ) ,

where xα=inf{x:FX(x)α}subscript𝑥𝛼infimumconditional-set𝑥subscript𝐹𝑋𝑥𝛼x_{\alpha}=\inf\{x\in\mathbb{R}\colon F_{X}(x)\geq\alpha\}italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = roman_inf { italic_x ∈ blackboard_R : italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) ≥ italic_α }. In the case when FX(xα)=αsubscript𝐹𝑋subscript𝑥𝛼𝛼F_{X}(x_{\alpha})=\alphaitalic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) = italic_α, this definition simplifies to CVaRα(X)=𝔼[XXxα]subscriptCVaR𝛼𝑋𝔼delimited-[]conditional𝑋𝑋subscript𝑥𝛼\operatorname{CVaR}_{\alpha}(X)=\mathbb{E}[X\mid X\leq x_{\alpha}]roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_X ) = blackboard_E [ italic_X ∣ italic_X ≤ italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ], i.e. we are considering the expectation of X𝑋Xitalic_X when we are conditioning X𝑋Xitalic_X to take values in its bottom α𝛼\alphaitalic_α quantile. Accordingly, we define the upper CVaR as

CVaR¯α(X)=CVaRα(X).subscript¯CVaR𝛼𝑋subscriptCVaR𝛼𝑋\displaystyle\overline{\operatorname{CVaR}}_{\alpha}(X)=-\operatorname{CVaR}_{% \alpha}(-X)\,.over¯ start_ARG roman_CVaR end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_X ) = - roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( - italic_X ) . (4)

Therefore we are considering the expectation of X𝑋Xitalic_X conditioned on values in its upper α𝛼\alphaitalic_α quantile. This allows us to prove the following lemma.

Lemma 1.

Suppose a random variable X𝑋Xitalic_X with probabilities px=[X=x]subscript𝑝𝑥delimited-[]𝑋𝑥p_{x}=\mathbb{P}[X=x]italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = blackboard_P [ italic_X = italic_x ] for x𝑥x\in\mathbb{R}italic_x ∈ blackboard_R. Further, suppose another random variable X~normal-~𝑋\widetilde{X}over~ start_ARG italic_X end_ARG as well as a given constant C1𝐶1C\geq 1italic_C ≥ 1 such that p~x=[X~=x]px/Csubscriptnormal-~𝑝𝑥delimited-[]normal-~𝑋𝑥subscript𝑝𝑥𝐶\widetilde{p}_{x}=\mathbb{P}[\widetilde{X}=x]\geq p_{x}/Cover~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = blackboard_P [ over~ start_ARG italic_X end_ARG = italic_x ] ≥ italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT / italic_C. Then we have

CVaRα(X~)subscriptCVaR𝛼~𝑋\displaystyle\operatorname{CVaR}_{\alpha}(\widetilde{X})roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( over~ start_ARG italic_X end_ARG ) 𝔼[X]absent𝔼delimited-[]𝑋absent\displaystyle\leq\mathbb{E}[X]\leq≤ blackboard_E [ italic_X ] ≤ CVaR¯α(X~),subscript¯CVaR𝛼~𝑋\displaystyle\overline{\operatorname{CVaR}}_{\alpha}(\widetilde{X})\,,over¯ start_ARG roman_CVaR end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( over~ start_ARG italic_X end_ARG ) , (5)

for all α1/C𝛼1𝐶\alpha\leq 1/Citalic_α ≤ 1 / italic_C. Thus, the lower and upper CVaR of X~normal-~𝑋\widetilde{X}over~ start_ARG italic_X end_ARG with α1/C𝛼1𝐶\alpha\leq 1/Citalic_α ≤ 1 / italic_C define lower and upper bounds, respectively, of the expectation value of X𝑋Xitalic_X.

Proof.

By monotonicity of CVaRα(X~)subscriptCVaR𝛼~𝑋\operatorname{CVaR}_{\alpha}(\widetilde{X})roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( over~ start_ARG italic_X end_ARG ) in α𝛼\alphaitalic_α, it suffices to show the claim for α=1/C𝛼1𝐶\alpha=1/Citalic_α = 1 / italic_C. Let x1<<xnsubscript𝑥1subscript𝑥𝑛x_{1}<\cdots<x_{n}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < ⋯ < italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT denote the support of p~~𝑝\widetilde{p}over~ start_ARG italic_p end_ARG. Take kn𝑘𝑛k\leq nitalic_k ≤ italic_n such that ik1p~xi<1/Cikp~xisubscript𝑖𝑘1subscript~𝑝subscript𝑥𝑖1𝐶subscript𝑖𝑘subscript~𝑝subscript𝑥𝑖\sum_{i\leq k-1}\widetilde{p}_{x_{i}}<1/C\leq\sum_{i\leq k}\widetilde{p}_{x_{i}}∑ start_POSTSUBSCRIPT italic_i ≤ italic_k - 1 end_POSTSUBSCRIPT over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT < 1 / italic_C ≤ ∑ start_POSTSUBSCRIPT italic_i ≤ italic_k end_POSTSUBSCRIPT over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, then

CVaR1/C(X~)=Cikxip~xi+xk(1Cikp~xi).subscriptCVaR1𝐶~𝑋𝐶subscript𝑖𝑘subscript𝑥𝑖subscript~𝑝subscript𝑥𝑖subscript𝑥𝑘1𝐶subscript𝑖𝑘subscript~𝑝subscript𝑥𝑖\displaystyle\operatorname{CVaR}_{1/C}(\widetilde{X})=C\sum_{i\leq k}x_{i}% \widetilde{p}_{x_{i}}+x_{k}\left(1-C\sum_{i\leq k}\widetilde{p}_{x_{i}}\right)\,.roman_CVaR start_POSTSUBSCRIPT 1 / italic_C end_POSTSUBSCRIPT ( over~ start_ARG italic_X end_ARG ) = italic_C ∑ start_POSTSUBSCRIPT italic_i ≤ italic_k end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( 1 - italic_C ∑ start_POSTSUBSCRIPT italic_i ≤ italic_k end_POSTSUBSCRIPT over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) .

Clearly, the p𝑝pitalic_p minimizing 𝔼[X]=xxpx𝔼delimited-[]𝑋subscript𝑥𝑥subscript𝑝𝑥\mathbb{E}[X]=\sum_{x}xp_{x}blackboard_E [ italic_X ] = ∑ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_x italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and satisfying pxCp~xsubscript𝑝𝑥𝐶subscript~𝑝𝑥p_{x}\leq C\widetilde{p}_{x}italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ≤ italic_C over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT for all x𝑥xitalic_x is also supported on {x1,,xn}subscript𝑥1subscript𝑥𝑛\{x_{1},\ldots,x_{n}\}{ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } and satisfies

pxisubscript𝑝subscript𝑥𝑖\displaystyle p_{x_{i}}italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT =Cp~xi for all i<k, andabsent𝐶subscript~𝑝subscript𝑥𝑖 for all i<k, and\displaystyle=C\widetilde{p}_{x_{i}}\text{ for all $i<k$, and}= italic_C over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT for all italic_i < italic_k , and
pxksubscript𝑝subscript𝑥𝑘\displaystyle p_{x_{k}}italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT 1i<kpxi=1Ci<kp~xiabsent1subscript𝑖𝑘subscript𝑝subscript𝑥𝑖1𝐶subscript𝑖𝑘subscript~𝑝subscript𝑥𝑖\displaystyle\leq 1-\sum_{i<k}p_{x_{i}}=1-C\sum_{i<k}\widetilde{p}_{x_{i}}≤ 1 - ∑ start_POSTSUBSCRIPT italic_i < italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 1 - italic_C ∑ start_POSTSUBSCRIPT italic_i < italic_k end_POSTSUBSCRIPT over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT

From this, the claim is immediate by using the above to lower bound 𝔼[X]𝔼delimited-[]𝑋\mathbb{E}[X]blackboard_E [ italic_X ]. The upper bound follows by applying the lower bound to X𝑋-X- italic_X and X~~𝑋-\widetilde{X}- over~ start_ARG italic_X end_ARG in place of X𝑋Xitalic_X and X~~𝑋\widetilde{X}over~ start_ARG italic_X end_ARG. ∎

Next, let us consider again a noise-free n𝑛nitalic_n-qubit quantum state ρ𝜌\rhoitalic_ρ, its noisy version ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG, and the corresponding γ𝛾\gammaitalic_γ. Further, suppose a diagonal Hamiltonian H𝐻Hitalic_H, which can also be interpreted as a function h:{0,1}n:superscript01𝑛h:\{0,1\}^{n}\rightarrow\mathbb{R}italic_h : { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R. Let us define the random variables X,X~{0,1}n𝑋~𝑋superscript01𝑛X,\widetilde{X}\in\{0,1\}^{n}italic_X , over~ start_ARG italic_X end_ARG ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, as the result of measuring ρ𝜌\rhoitalic_ρ and ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG, respectively. Then, Lemma 1 and Eq. (2) immediately imply

CVaRα(h(X~))subscriptCVaR𝛼~𝑋\displaystyle\operatorname{CVaR}_{\alpha}(h(\widetilde{X}))roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_h ( over~ start_ARG italic_X end_ARG ) ) 𝔼[h(X)]absent𝔼delimited-[]𝑋absent\displaystyle\leq\mathbb{E}[h(X)]\leq≤ blackboard_E [ italic_h ( italic_X ) ] ≤ CVaR¯α(h(X~)),subscript¯CVaR𝛼~𝑋\displaystyle\overline{\operatorname{CVaR}}_{\alpha}(h(\widetilde{X}))\,,over¯ start_ARG roman_CVaR end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_h ( over~ start_ARG italic_X end_ARG ) ) , (6)

for all α1/γ𝛼1𝛾\alpha\leq 1/\sqrt{\gamma}italic_α ≤ 1 / square-root start_ARG italic_γ end_ARG. Since, for a diagonal H𝐻Hitalic_H we have tr(ρH)=𝔼[h(X)]tr𝜌𝐻𝔼delimited-[]𝑋\operatorname{tr}(\rho H)=\mathbb{E}[h(X)]roman_tr ( italic_ρ italic_H ) = blackboard_E [ italic_h ( italic_X ) ], Eq. (6) implies that the lower/upper CVaR computed from the noisy samples ρ𝜌\rhoitalic_ρ provide lower/upper bounds for the noise-free expectation value of ρ𝜌\rhoitalic_ρ. Further, suppose ρ𝜌\rhoitalic_ρ is the ground state of the diagonal H𝐻Hitalic_H. Then, h(X~)~𝑋h(\widetilde{X})italic_h ( over~ start_ARG italic_X end_ARG ) cannot achieve any values smaller than tr(ρH)tr𝜌𝐻\operatorname{tr}(\rho H)roman_tr ( italic_ρ italic_H ) and the left inequality in Eq. (6) is an equality. Thus, the noisy lower CVaR is equal to the ground state energy (similarly for the upper CVaR if ρ𝜌\rhoitalic_ρ would correspond to the maximally excited state of H𝐻Hitalic_H). Further, we also know that if the noisy CVaR would equal the ground state energy, the fidelity between the noise-free state ρ𝜌\rhoitalic_ρ and the noisy state ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG is lower bounded by the considered α𝛼\alphaitalic_α, i.e., (ρ,ρ~)α𝜌~𝜌𝛼\mathcal{F}(\rho,\widetilde{\rho})\geq\alphacaligraphic_F ( italic_ρ , over~ start_ARG italic_ρ end_ARG ) ≥ italic_α.

Diagonal Hamiltonians arise, e.g., in optimization problems or in the form of projectors |xx|ket𝑥bra𝑥|x\rangle\!\langle x|| italic_x ⟩ ⟨ italic_x |, as can be used, e.g., for fidelity estimations. We will discuss these applications in more detail in Sec. IV.1 and Sec. IV.2. However, many applications also involve non-diagonal Hamiltonians, most prominently applications in quantum chemistry and physics peruzzo_2014_vqe . Suppose a non-diagonal Hamiltonian H=iciPi𝐻subscript𝑖subscript𝑐𝑖subscript𝑃𝑖H=\sum_{i}c_{i}P_{i}italic_H = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denote Pauli terms and cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT the corresponding weights. Then, we can decompose H𝐻Hitalic_H into a sum of Hamiltonians consisting of subsets of commuting Pauli strings H=jHj𝐻subscript𝑗subscript𝐻𝑗H=\sum_{j}H_{j}italic_H = ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. All Pauli terms in Hjsubscript𝐻𝑗H_{j}italic_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT can be simultaneously diagonalized via single qubit Pauli rotations. Thus, we can assume the Hjsubscript𝐻𝑗H_{j}italic_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are diagonal without loss of generality. We define the corresponding functions hj:{0,1}n:subscript𝑗superscript01𝑛h_{j}:\{0,1\}^{n}\rightarrow\mathbb{R}italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT : { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R as well as noise-free and noisy random variables Xj,X~jsubscript𝑋𝑗subscript~𝑋𝑗X_{j},\widetilde{X}_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over~ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, respectively, resulting from measuring the quantum states with the corresponding post-rotations to diagonalize the Hamiltonians Hjsubscript𝐻𝑗H_{j}italic_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. This implies

jCVaRα(hj(X~j))subscript𝑗subscriptCVaR𝛼subscript𝑗subscript~𝑋𝑗\displaystyle\sum_{j}\operatorname{CVaR}_{\alpha}(h_{j}(\widetilde{X}_{j}))∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over~ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) \displaystyle\leq tr(ρH)tr𝜌𝐻\displaystyle\operatorname{tr}(\rho H)roman_tr ( italic_ρ italic_H ) (7)
\displaystyle\leq jCVaR¯α(hj(X~j)),subscript𝑗subscript¯CVaR𝛼subscript𝑗subscript~𝑋𝑗\displaystyle\sum_{j}\overline{\operatorname{CVaR}}_{\alpha}(h_{j}(\widetilde{% X}_{j}))\,,∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT over¯ start_ARG roman_CVaR end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over~ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) ,

for all α1/γ𝛼1𝛾\alpha\leq 1/\sqrt{\gamma}italic_α ≤ 1 / square-root start_ARG italic_γ end_ARG, which extends the previous result to non-diagonal Hamiltonians. Note that in contrast to diagonal Hamiltonians, we cannot draw conclusions anymore about the groundstate energy or the fidelity between noisy state and groundstate. For instance, the lower bound in Eq. (7) can be strictly smaller then the groundstate energy.

The CVaR can be estimated using Monte Carlo sampling. The variance of this estimator depends on the type of distribution considered but is always bounded by 𝒪(1/α2)𝒪1superscript𝛼2\mathcal{O}(1/\alpha^{2})caligraphic_O ( 1 / italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). However, for instance, for Normal and Bernoulli distributions it can even be shown that in the present context the analytic behavior of the variances of CVaR for α0𝛼0\alpha\rightarrow 0italic_α → 0 is 𝒪(1/α)𝒪1𝛼\mathcal{O}(1/\alpha)caligraphic_O ( 1 / italic_α ), where for Bernoulli, we assume that the success probability p𝑝pitalic_p satisfies p=𝒪(1/γ)𝑝𝒪1𝛾p=\mathcal{O}(1/\sqrt{\gamma})italic_p = caligraphic_O ( 1 / square-root start_ARG italic_γ end_ARG ), which is the relevant case for the applications we consider later on, cf. Sec. IV.2. The derivation for the variance bounds for CVaR estimation are provided in Appendix C. Thus, in these cases and for α=1/γ𝛼1𝛾\alpha=1/\sqrt{\gamma}italic_α = 1 / square-root start_ARG italic_γ end_ARG, the variance increases as 𝒪(γ)𝒪𝛾\mathcal{O}(\sqrt{\gamma})caligraphic_O ( square-root start_ARG italic_γ end_ARG ). This renders the CVaR a very promising noise-robust loss function for variational quantum algorithms. The variance is amplified significantly less than for PEC, where it increases as 𝒪(γ2)𝒪superscript𝛾2\mathcal{O}(\gamma^{2})caligraphic_O ( italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). However, we need to recall that PEC comes with much stronger theoretical guarantees, i.e., provides an unbiased estimator instead of a bound. Thus, depending on the application, CVaR might not be applicable.

In the remainder of this section we discuss improvements to the lower and upper bounds for cases where we have more information about the noise-free state. I.e, properties that the bit strings measured from the noise-free state must have but that might not persist under noise. Examples of such properties are particle preservation in quantum chemistry Bonet_Monroig_2018_post_selection ; Choquette_2021 and constraints satisfaction in quantum optimization barkoutsos_2020_cvar .

Suppose a function :{0,1}n{0,1}:superscript01𝑛01\mathcal{F}:\{0,1\}^{n}\rightarrow\{0,1\}caligraphic_F : { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → { 0 , 1 } that determines whether a bit string x𝑥xitalic_x has a required property. Here, (x)=1𝑥1\mathcal{F}(x)=1caligraphic_F ( italic_x ) = 1 indicates the presence of the property. Further, suppose a given Hamiltonian H𝐻Hitalic_H and, for simplicity, let us assume it is diagonal and defined by a function h:{0,1}n:superscript01𝑛h:\{0,1\}^{n}\rightarrow\mathbb{R}italic_h : { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R. From this, we can construct a modified Hamiltonian HMsuperscriptsubscript𝐻𝑀H_{\mathcal{F}}^{M}italic_H start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT defined by the function

hM(x)superscriptsubscript𝑀𝑥\displaystyle h_{\mathcal{F}}^{M}(x)italic_h start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ( italic_x ) =\displaystyle== {h(x)if (x)=1,Motherwise,cases𝑥if 𝑥1𝑀otherwise,\displaystyle\begin{cases}h(x)&\text{if }\mathcal{F}(x)=1,\\ M&\text{otherwise,}\end{cases}{ start_ROW start_CELL italic_h ( italic_x ) end_CELL start_CELL if caligraphic_F ( italic_x ) = 1 , end_CELL end_ROW start_ROW start_CELL italic_M end_CELL start_CELL otherwise, end_CELL end_ROW (8)

where M𝑀Mitalic_M is a given constant. We thus have tr(ρH)=tr(ρHM)tr𝜌𝐻tr𝜌superscriptsubscript𝐻𝑀\operatorname{tr}(\rho H)=\operatorname{tr}(\rho H_{\mathcal{F}}^{M})roman_tr ( italic_ρ italic_H ) = roman_tr ( italic_ρ italic_H start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ) in the noise-free case for any M𝑀Mitalic_M, since all noise-free samples x𝑥xitalic_x satisfy (x)=1𝑥1\mathcal{F}(x)=1caligraphic_F ( italic_x ) = 1. Next, we assume constants Mlsubscript𝑀𝑙M_{l}italic_M start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and Musubscript𝑀𝑢M_{u}italic_M start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT that satisfy Mlh(x)Musubscript𝑀𝑙𝑥subscript𝑀𝑢M_{l}\leq h(x)\leq M_{u}italic_M start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ≤ italic_h ( italic_x ) ≤ italic_M start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT for all x𝑥xitalic_x with (x)=1𝑥1\mathcal{F}(x)=1caligraphic_F ( italic_x ) = 1. Samples with (x)=0𝑥0\mathcal{F}(x)=0caligraphic_F ( italic_x ) = 0 must be affected by noise, which allows us to filter out samples where the noise destroys the required property. Although there might still be noisy samples that are feasible, the post-selection reduces the impact of noise. Due to the equality of expectation values in the noise-free case and the choice of Mlsubscript𝑀𝑙M_{l}italic_M start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and Musubscript𝑀𝑢M_{u}italic_M start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, we immediately get

CVaRα(hMu(X~))subscriptCVaR𝛼superscriptsubscriptsubscript𝑀𝑢~𝑋\displaystyle\operatorname{CVaR}_{\alpha}(h_{\mathcal{F}}^{M_{u}}(\widetilde{X% }))roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_X end_ARG ) ) 𝔼[X]absent𝔼delimited-[]𝑋absent\displaystyle\leq\mathbb{E}[X]\leq≤ blackboard_E [ italic_X ] ≤ CVaR¯α(hMl(X~)),subscript¯CVaR𝛼superscriptsubscriptsubscript𝑀𝑙~𝑋\displaystyle\overline{\operatorname{CVaR}}_{\alpha}(h_{\mathcal{F}}^{M_{l}}(% \widetilde{X})),over¯ start_ARG roman_CVaR end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( over~ start_ARG italic_X end_ARG ) ) , (9)

for all α1/γ𝛼1𝛾\alpha\leq 1/\sqrt{\gamma}italic_α ≤ 1 / square-root start_ARG italic_γ end_ARG. This can lead to significantly better bounds since we can leverage the additional information about the considered problem to filter out more noisy samples. For non-diagonal Hamiltonians, see Eq. (7), it is possible to define a filter function jsubscript𝑗\mathcal{F}_{j}caligraphic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for each Hjsubscript𝐻𝑗H_{j}italic_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

Another implication of our results is that the average over the post-selected noisy samples must lie between the lower and upper bounds resulting from the filtered CVaR due to the monotonicity of CVaR with respect to α𝛼\alphaitalic_α. Thus, the CVaR allows to bound the bias that post-selection may introduce and provide a quality measure for the estimated expectation value.

IV Applications

We now discuss the presented theory on sampling probabilities and CVaR in the context of different applications: first, quantum optimization farhi_2014_qaoa ; barkoutsos_2020_cvar ; egger2021warm ; zoufal_2023_blackbox ; weidenfeller2022scaling , and second, fidelity-based algorithms, such as Quantum Support Vector Machines (QSVM) Havlicek2019 ; gentinetta2022complexity ; gentinetta2023quantum as well as Variational Quantum Time Evolution (VarQTE) McArdle_2019_varqte ; Yuan_2019_varqte ; Zoufal_2021_varqbm ; Zoufal_2023_varqte_error_bounds ; Gacon_2021_qnspsa ; gacon2023stochastic ; gacon2023variational . These are illustrative examples, the theory presented here is applicable to many other domains, such as quantum chemistry and physics.

IV.1 (Variational) Quantum Optimization

Many variational quantum algorithms have been proposed to solve discrete optimization problems, such as Quadratic Unconstrained Binary Optimization (QUBO). Most of them have a similar structure and interpret every measured bit string as a potential solution to the problem. Proposals that derive variable values from expectation values Bravyi2019 ; fuller2021approximate ; teramoto2023quantumrelaxation ; patti2022variational are, however, not in the focus of our work.

Suppose a generic unconstrained binary optimization problem of the form

minx{0,1}nf(x),subscript𝑥superscript01𝑛𝑓𝑥\displaystyle\min_{x\in\{0,1\}^{n}}f(x)\,,roman_min start_POSTSUBSCRIPT italic_x ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f ( italic_x ) , (10)

where f:{0,1}n:𝑓maps-tosuperscript01𝑛f:\{0,1\}^{n}\mapsto\mathbb{R}italic_f : { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ↦ blackboard_R is an objective function on n𝑛nitalic_n binary variables. For instance, a QUBO has f(x)=xTQx𝑓𝑥superscript𝑥𝑇𝑄𝑥f(x)=x^{T}Qxitalic_f ( italic_x ) = italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q italic_x with Qn×n𝑄superscript𝑛𝑛Q\in\mathbb{R}^{n\times n}italic_Q ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT. In case of QUBO, we can apply a change of variables xi=(1zi)/2subscript𝑥𝑖1subscript𝑧𝑖2x_{i}=(1-z_{i})/2italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( 1 - italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) / 2 for zi{1,+1}subscript𝑧𝑖11z_{i}\in\{-1,+1\}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { - 1 , + 1 } and replace zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by the Pauli Zisubscript𝑍𝑖Z_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT matrix on qubit i𝑖iitalic_i and products zizjsubscript𝑧𝑖subscript𝑧𝑗z_{i}z_{j}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT by ZiZjtensor-productsubscript𝑍𝑖subscript𝑍𝑗Z_{i}\otimes Z_{j}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊗ italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to define a diagonal Hamiltonian H𝐻Hitalic_H and translate Eq. (10) into a ground state problem lucas_2014_ising

min|ψψ|H|ψ.subscriptket𝜓quantum-operator-product𝜓𝐻𝜓\displaystyle\min_{\ket{\psi}}\braket{\psi}{H}{\psi}\,.roman_min start_POSTSUBSCRIPT | start_ARG italic_ψ end_ARG ⟩ end_POSTSUBSCRIPT ⟨ start_ARG italic_ψ end_ARG | start_ARG italic_H end_ARG | start_ARG italic_ψ end_ARG ⟩ . (11)

As mentioned in Sec. III, we can transform any generic function to a Hamiltonian where f(x)𝑓𝑥f(x)italic_f ( italic_x ) defines the diagonal element of H𝐻Hitalic_H at the position of the computational basis state |xket𝑥\ket{x}| start_ARG italic_x end_ARG ⟩ zoufal_2023_blackbox .

Most variational quantum algorithms for binary optimization are defined via a parameterized ansatz |ψ(θ)ket𝜓𝜃\ket{\psi(\theta)}| start_ARG italic_ψ ( italic_θ ) end_ARG ⟩ with parameters θd𝜃superscript𝑑\theta\in\mathbb{R}^{d}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, a loss function (θ)𝜃\mathcal{L}(\theta)caligraphic_L ( italic_θ ) that maps parameter values to a loss value, and an optimizer to solve

minθd(θ).subscript𝜃superscript𝑑𝜃\displaystyle\min_{\theta\in\mathbb{R}^{d}}\mathcal{L}(\theta).roman_min start_POSTSUBSCRIPT italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_L ( italic_θ ) . (12)

After the final parameters θ*superscript𝜃\theta^{*}italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT are determined, the resulting state |ψ(θ*)ket𝜓superscript𝜃\ket{\psi(\theta^{*})}| start_ARG italic_ψ ( italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) end_ARG ⟩ is measured and the sampled bit strings are used as potential solutions to the problem. Samples obtained during the execution of the algorithm can also be considered as solutions in case they achieve better objective values than the final samples.

If we set (θ)=ψ(θ)|H|ψ(θ)𝜃quantum-operator-product𝜓𝜃𝐻𝜓𝜃\mathcal{L}(\theta)=\braket{\psi(\theta)}{H}{\psi(\theta)}caligraphic_L ( italic_θ ) = ⟨ start_ARG italic_ψ ( italic_θ ) end_ARG | start_ARG italic_H end_ARG | start_ARG italic_ψ ( italic_θ ) end_ARG ⟩ for some ansatz |ψ(θ)ket𝜓𝜃\ket{\psi(\theta)}| start_ARG italic_ψ ( italic_θ ) end_ARG ⟩, we get the Variational Quantum Eigensolver (VQE) peruzzo_2014_vqe . Further, if we define the ansatz as

|ψ(θ)ket𝜓𝜃\displaystyle\ket{\psi(\theta)}| start_ARG italic_ψ ( italic_θ ) end_ARG ⟩ =\displaystyle== j=1peiHXβjeiHγj|+,superscriptsubscriptproduct𝑗1𝑝superscript𝑒𝑖subscript𝐻𝑋subscript𝛽𝑗superscript𝑒𝑖𝐻subscript𝛾𝑗ket\displaystyle\prod_{j=1}^{p}e^{-iH_{X}\beta_{j}}e^{-iH\gamma_{j}}\ket{+},∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_i italic_H start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_i italic_H italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | start_ARG + end_ARG ⟩ , (13)

we get the QAOA farhi_2014_qaoa , where p𝑝pitalic_p defines the depth, βj,γjsubscript𝛽𝑗subscript𝛾𝑗\beta_{j},\gamma_{j}\in\mathbb{R}italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R are the variational parameters, and HX=i=1nXisubscript𝐻𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖H_{X}=-\sum_{i=1}^{n}X_{i}italic_H start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT = - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the Pauli X𝑋Xitalic_X matrix on qubit i𝑖iitalic_i.

The results from Sec. II and III immediately apply to QAOA. Suppose we already have a quantum circuit that, when executed and measured in an ideal noise-free setting, produces good solutions to a considered optimization problem. Sec. II immediately implies that when executed on a noisy devices, a sampling overhead of γ𝛾\sqrt{\gamma}square-root start_ARG italic_γ end_ARG is sufficient to extract solutions of the same quality as in the noise-free case. In certain cases it might be feasible to determine θ*superscript𝜃\theta^{*}italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT classically streif2019training ; sack2021quantum and only use the quantum computer to sample good solutions, since evaluating (local) expectation values might be easier than sampling from the full circuit begusic023simulating . However, in cases where we must train the parameterized quantum circuit we can replace the expectation value by the CVaR barkoutsos_2020_cvar . The results introduced in Sec. III now provide guidance on how to choose α𝛼\alphaitalic_α and the required sampling overhead to get good results from a noisy device. We illustrate this on concrete examples in Sec. V.2 and Sec. V.1.

Our results allow us to apply proven performance guarantees for QAOA without noise to noisy hardware. For MAXCUT on 3-regular graphs, QAOA achieves a worst-case performance of 0.6920.6920.6920.692 for p=1𝑝1p=1italic_p = 1 farhi_2014_qaoa , 0.75590.75590.75590.7559 for p=2𝑝2p=2italic_p = 2, and (under certain assumptions) 0.79240.79240.79240.7924 for p=3𝑝3p=3italic_p = 3 wurtz_2021_qaoa . With a γ𝛾\sqrt{\gamma}square-root start_ARG italic_γ end_ARG sampling overhead these guarantees are recovered even in the noisy regime. Furthermore, for 3-regular graphs, we can always train QAOA with p3𝑝3p\leq 3italic_p ≤ 3 classically by simulating at most 30 qubits at a time Sack2023 , i.e., we can determine the optimal parameters via classical simulation and then sample good solutions with a γ𝛾\sqrt{\gamma}square-root start_ARG italic_γ end_ARG overhead from the quantum computer. Since γ𝛾\gammaitalic_γ grows exponentially with the circuit size the sampling overhead introduced to combat noise may exceed the cost of a brute force search. A simple back of the envelope calculation, discussed in Appendix D, determines a minimum layer fidelity require to apply a depth p𝑝pitalic_p QAOA.

The Quantum Alternating Operator Ansatz (QAOA’) is an alternative of QAOA hadfield_quantum_2019 . Here, a constraint, e.g., a fixed Hamming weight (i.e., a fixed number of ones in a bit string) is enforced by changing the mixer to preserve such states wang2020xymixers ; cook2020vertexcover ; golden2023numerical and starting in (a superposition of) feasible states baertschi2022shortdepth ; baertschi2020grover . Thus, if QAOA’ is executed noise-free, all resulting samples satisfy the given constraint. This is an example of a filter function \mathcal{F}caligraphic_F, as introduced in Sec. III, helps to improve the CVaR bounds on the corresponding expectation value.

IV.2 Fidelities

Several quantum algorithms leverage fidelity estimation between two quantum states in a sub-routine. In the following, we first discuss how to leverage the CVaR bounds to approximate fidelities on noisy quantum computers and then how this impacts two concrete classes of algorithms: QSVMs and VarQTE.

Suppose we have n𝑛nitalic_n-qubit quantum circuits U𝑈Uitalic_U and V𝑉Vitalic_V that define |ψ=U|0ket𝜓𝑈ket0\ket{\psi}=U\ket{0}| start_ARG italic_ψ end_ARG ⟩ = italic_U | start_ARG 0 end_ARG ⟩ and |ϕ=V|0ketitalic-ϕ𝑉ket0\ket{\phi}=V\ket{0}| start_ARG italic_ϕ end_ARG ⟩ = italic_V | start_ARG 0 end_ARG ⟩, respectively. A common approach to estimate the fidelity between |ψket𝜓\ket{\psi}| start_ARG italic_ψ end_ARG ⟩ and |ϕketitalic-ϕ\ket{\phi}| start_ARG italic_ϕ end_ARG ⟩ is the compute-uncompute method given by

(|ψ,|ϕ)ket𝜓ketitalic-ϕ\displaystyle\mathcal{F}(\ket{\psi},\ket{\phi})caligraphic_F ( | start_ARG italic_ψ end_ARG ⟩ , | start_ARG italic_ϕ end_ARG ⟩ ) =\displaystyle== |0|VU|0|2.superscriptquantum-operator-product0superscript𝑉𝑈02\displaystyle\left|\braket{0}{V^{\dagger}U}{0}\right|^{2}.| ⟨ start_ARG 0 end_ARG | start_ARG italic_V start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_U end_ARG | start_ARG 0 end_ARG ⟩ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (14)

\mathcal{F}caligraphic_F is thus the probability of measuring |0ket0\ket{0}| start_ARG 0 end_ARG ⟩ for the state VU|0superscript𝑉𝑈ket0V^{\dagger}U\ket{0}italic_V start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_U | start_ARG 0 end_ARG ⟩. This also equals the expectation value tr(ρH)tr𝜌𝐻\operatorname{tr}(\rho H)roman_tr ( italic_ρ italic_H ) for the state ρ=VU|0𝜌superscript𝑉𝑈ket0\rho=V^{\dagger}U\ket{0}italic_ρ = italic_V start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_U | start_ARG 0 end_ARG ⟩ and the diagonal Hamiltonian H=|00|𝐻ket0bra0H=|0\rangle\!\langle 0|italic_H = | 0 ⟩ ⟨ 0 |. Thus, we can use CVaR¯¯CVaR\overline{\operatorname{CVaR}}over¯ start_ARG roman_CVaR end_ARG to get an upper bound of the noise-free fidelity. Here, the resulting random variable follows a Bernoulli distribution, as the expectation value counts the number of measured |0ket0\ket{0}| start_ARG 0 end_ARG ⟩’s and ignores all other outcomes. Since the variance of the CVaR for a Bernoulli random variable scales with 1/α1𝛼1/\alpha1 / italic_α, see Sec. III, we can set α=1/γ𝛼1𝛾\alpha=1/\sqrt{\gamma}italic_α = 1 / square-root start_ARG italic_γ end_ARG and use Eq. (6) to upper bound the fidelity with a sampling overhead of γ𝛾\sqrt{\gamma}square-root start_ARG italic_γ end_ARG compared to the γ2superscript𝛾2\gamma^{2}italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT required by PEC to get an unbiased estimation.

QSVMs leverage a quantum feature map to define a quantum kernel and provably outperform classical computers on certain tasks Liu_2021 . The quantum feature map is a parameterized quantum circuit that takes a classical feature vector x𝑥xitalic_x as an input to prepare a corresponding quantum state |ϕ(x)ketitalic-ϕ𝑥\ket{\phi(x)}| start_ARG italic_ϕ ( italic_x ) end_ARG ⟩. The corresponding quantum kernel is then defined via the Hilbert-Schmidt inner product of |ϕ(x1)ketitalic-ϕsubscript𝑥1\ket{\phi(x_{1})}| start_ARG italic_ϕ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG ⟩ and |ϕ(x2)ketitalic-ϕsubscript𝑥2\ket{\phi(x_{2})}| start_ARG italic_ϕ ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG ⟩ for two classical data points x1,x2subscript𝑥1subscript𝑥2x_{1},x_{2}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT from some training set, which equals (|ψ,|ϕ)ket𝜓ketitalic-ϕ\mathcal{F}(\ket{\psi},\ket{\phi})caligraphic_F ( | start_ARG italic_ψ end_ARG ⟩ , | start_ARG italic_ϕ end_ARG ⟩ ), and thus, falls exactly into the case above.

VarQTE for real or imaginary time evolution assumes a given parametrized quantum state |ψ(θ)ket𝜓𝜃\ket{\psi(\theta)}| start_ARG italic_ψ ( italic_θ ) end_ARG ⟩ and then projects the exact state evolution to the parameter evolution of the ansatz. This approximates the desired time evolution in the sub-space that the ansatz can represent. The exact projection requires the evaluation of the quantum geometric tensor (QGT) McArdle_2019_varqte ; Yuan_2019_varqte ; Zoufal_2023_varqte_error_bounds . However, that quickly becomes prohibitive as the number of parameters increases. Thus, multiple approximate variants of VarQTE have been proposed that workaround the evaluation of the QGT Gacon_2021_qnspsa ; gacon2023stochastic ; gacon2023variational . Many of these approximations leverage that the Hessian of the fidelity |ψ(θ)|ψ(θ+δθ)|2superscriptinner-product𝜓𝜃𝜓𝜃𝛿𝜃2|\braket{\psi(\theta)}{\psi(\theta+\delta\theta)}|^{2}| ⟨ start_ARG italic_ψ ( italic_θ ) end_ARG | start_ARG italic_ψ ( italic_θ + italic_δ italic_θ ) end_ARG ⟩ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with respect to δθ𝛿𝜃\delta\thetaitalic_δ italic_θ which is proportional to the QGT of |ψ(θ)ket𝜓𝜃\ket{\psi(\theta)}| start_ARG italic_ψ ( italic_θ ) end_ARG ⟩ up to higher order terms. They either use Simultaneous Perturbation Stochastic Approximation (SPSA) to estimate the Hessian from evaluations of the fidelity as approximations of the QGT, or they construct alternative loss functions that directly leverage the mentioned fidelity without constructing an approximate QGT. In all variants, the parameter disturbances δθ𝛿𝜃\delta\thetaitalic_δ italic_θ are small, which implies fidelities close to one. Thus, this is in the regime where the noisy CVaR is very close to the noise-free expectation value, i.e., the sweet spot of the introduced approximation.

V Experiments

Within this section, we analyze two optimization problems from the literature to demonstrate the theory presented in this paper. In both cases, we run QAOA circuits on ibm_sherbrooke ibm_quantum_devices . First, smaller but deeper circuits, and second, larger but more shallow circuits. We always find a nice agreement between the theory and the experimental results. All results within this section are achieved without twirling the circuits. For a comparison and discussion of twirled and untwirled circuits see Appendix A.

ibm_sherbrooke is a 127 qubit superconducting qubit device with an echoed cross-resonance (ECR) gate as two-qubit gate Sheldon2016 . This gate is equivalent to a CNOT gate up to single-qubit gates and has a clear direction on the hardware. We let the transpiler take care of the map** from CNOT gates to ECR gates and will in the following write about CNOT gates for better readability.

V.1 QAOA for MAXCUT on 3-regular graphs with 40 nodes

Refer to caption
Figure 1: QAOA results on 40-qubits. The curve is the cumulative distributions function resulting from sampling the circuits for a MAXCUT instance executed on ibm_sherbrooke for p=1𝑝1p=1italic_p = 1 with 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT shots (top) and p=2𝑝2p=2italic_p = 2 with 107superscript10710^{7}10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT shots (bottom). The vertical lines show the corresponding noisy expectation values (dashed blue), the noise-free expectation values evaluated using light-cone optimized classical simulation (cyan dashed-dotted), the CVaR¯αpsubscript¯CVaRsubscript𝛼𝑝\overline{\operatorname{CVaR}}_{\alpha_{p}}over¯ start_ARG roman_CVaR end_ARG start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT (cyan dotted), and the globally optimal solution equal to 56565656 (green solid). The title shows the fitted αpsuperscriptsubscript𝛼𝑝\alpha_{p}^{\prime}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that the CVaR¯αpsubscript¯CVaRsuperscriptsubscript𝛼𝑝\overline{\operatorname{CVaR}}_{\alpha_{p}^{\prime}}over¯ start_ARG roman_CVaR end_ARG start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT are equal to the noise-free expectation values (i.e. cyan dashed-dotted).

In this section, we examine QAOA for MAXCUT on a random three-regular graph with 40 nodes, i.e., on 40 qubits. We take the problem instance from Ref. Sack2023 and optimize the parameters classically for QAOA with depth p=1𝑝1p=1italic_p = 1 and p=2𝑝2p=2italic_p = 2 using light-cone simplifications. This allows us to evaluate the required 2-local expectation values by simulating maximally 14 qubits at a time, see details in Ref. Sack2023 . The circuits and optimal parameters are further discussed in Appendix E.

We apply staggered dynamic decoupling for error suppression, as discussed in Appendix F. The circuits are constructed such that they consist of only two different layers of CNOT gates on a line of 40 qubits, denoted by q0,q39subscript𝑞0subscript𝑞39q_{0},\ldots q_{39}italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … italic_q start_POSTSUBSCRIPT 39 end_POSTSUBSCRIPT. The first layer is composed of 20 CNOT gates on qubits (qi,qi+1)subscript𝑞𝑖subscript𝑞𝑖1(q_{i},q_{i+1})( italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) for i𝑖iitalic_i even and the second composed of 19 CNOT gates on (qi,qi+1)subscript𝑞𝑖subscript𝑞𝑖1(q_{i},q_{i+1})( italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) for i𝑖iitalic_i odd. Using the technique introduced in Ref. mckay2023benchmarking the measured LF for these two layers is LF1=0.7686𝐿subscript𝐹10.7686LF_{1}=0.7686italic_L italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.7686 and LF2=0.7444𝐿subscript𝐹20.7444LF_{2}=0.7444italic_L italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.7444, respectively 111At the time of writing the experiment to measure layer fidelity is under implementation in Qiskit Experiments QiskitExperiments . See https://github.com/Qiskit-Extensions/qiskit-experiments. We take the geometric average over the total number of CNOT gates and derive a CNOT fidelity as CX=(LF1×LF2)1/39=0.9858subscript𝐶𝑋superscript𝐿subscript𝐹1𝐿subscript𝐹21390.9858\mathcal{F}_{CX}=(LF_{1}\times LF_{2})^{1/39}=0.9858caligraphic_F start_POSTSUBSCRIPT italic_C italic_X end_POSTSUBSCRIPT = ( italic_L italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_L italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 39 end_POSTSUPERSCRIPT = 0.9858. This also allows us to compute the error per layered gate (EPLG) of Ref. mckay2023benchmarking as 1CX=0.01421subscript𝐶𝑋0.01421-\mathcal{F}_{CX}=0.01421 - caligraphic_F start_POSTSUBSCRIPT italic_C italic_X end_POSTSUBSCRIPT = 0.0142. We also define γCX=1/CX2=1.0290subscript𝛾𝐶𝑋1superscriptsubscript𝐶𝑋21.0290\gamma_{CX}=1/\mathcal{F}_{CX}^{2}=1.0290italic_γ start_POSTSUBSCRIPT italic_C italic_X end_POSTSUBSCRIPT = 1 / caligraphic_F start_POSTSUBSCRIPT italic_C italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1.0290. In total, the circuits for p=1𝑝1p=1italic_p = 1 and p=2𝑝2p=2italic_p = 2 have 461 and 922 CNOT gates, respectively, all in form of the before mentioned layers. We can thus compute the sampling overhead for p=1𝑝1p=1italic_p = 1 and p=2𝑝2p=2italic_p = 2 as γ1=735.0subscript𝛾1735.0\sqrt{\gamma_{1}}=735.0square-root start_ARG italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG = 735.0 and γ2=540275.9subscript𝛾2540275.9\sqrt{\gamma_{2}}=540275.9square-root start_ARG italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG = 540275.9, respectively, which corresponds to α1=1.361×103subscript𝛼11.361superscript103\alpha_{1}=1.361\times 10^{-3}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1.361 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT and α2=1.851×106subscript𝛼21.851superscript106\alpha_{2}=1.851\times 10^{-6}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1.851 × 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT, for p=1𝑝1p=1italic_p = 1 and p=2𝑝2p=2italic_p = 2, respectively. A regularly measured EPLG evaluated over a chain of 100-qubits is provided for ibm_sherbrooke in the IBM Quantum Platform ibm_quantum_devices . At the time of the experiment the backend reported an EPLG of 0.0170.0170.0170.017, which is slightly higher than our measured EPLG. This is expected, since we restrict to 40 qubits. In any case, the EPLG reported by the backend is a good first proxy to estimate the LF and resulting γ𝛾\gammaitalic_γ when executing a particular circuit on a device.

p=1𝑝1p=1italic_p = 1 p=2𝑝2p=2italic_p = 2
global optimum 56
𝔼[X~]𝔼delimited-[]~𝑋\mathbb{E}[\widetilde{X}]blackboard_E [ over~ start_ARG italic_X end_ARG ] 30.2 29.9
𝔼[X]𝔼delimited-[]𝑋\mathbb{E}[X]blackboard_E [ italic_X ] 41.5 45.3
CVaR¯αpsubscript¯CVaRsubscript𝛼𝑝\overline{\operatorname{CVaR}}_{\alpha_{p}}over¯ start_ARG roman_CVaR end_ARG start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT 43.1 48.5
best sampled value 47 50
number of CNOT gates 461 922
γpsubscript𝛾𝑝\sqrt{\gamma_{p}}square-root start_ARG italic_γ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG 735.0735.0735.0735.0 540275.9540275.9540275.9540275.9
αpsubscript𝛼𝑝\alpha_{p}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT 1.361×1031.361superscript1031.361\times 10^{-3}1.361 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 1.851×1061.851superscript1061.851\times 10^{-6}1.851 × 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT
αpsuperscriptsubscript𝛼𝑝\alpha_{p}^{\prime}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT 5.180×1035.180superscript1035.180\times 10^{-3}5.180 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 1.071×1041.071superscript1041.071\times 10^{-4}1.071 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
γCXsubscript𝛾𝐶𝑋\gamma_{CX}italic_γ start_POSTSUBSCRIPT italic_C italic_X end_POSTSUBSCRIPT 1.0290
γCX,psuperscriptsubscript𝛾𝐶𝑋𝑝{\gamma}_{CX,p}^{\prime}italic_γ start_POSTSUBSCRIPT italic_C italic_X , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT 1.02311.02311.02311.0231 1.02001.02001.02001.0200
Table 1: QAOA results on 40-qubits: This table shows the different results for p=1𝑝1p=1italic_p = 1 and p=2𝑝2p=2italic_p = 2 when running QAOA on the introduced 40-qubit MAXCUT instance. It shows the noisy and noise-free expectation values as well as the CVaR estimates, best sampled values and the global optimal value. Further, it shows the total number of CNOT gates, the overall γpsubscript𝛾𝑝\sqrt{\gamma_{p}}square-root start_ARG italic_γ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG for the circuits, the αpsubscript𝛼𝑝\alpha_{p}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT derived from the LF as well as the αpsuperscriptsubscript𝛼𝑝\alpha_{p}^{\prime}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT derived from calibrating the CVaR on the noise-free expectation values, the corresponding γCXsubscript𝛾𝐶𝑋\gamma_{CX}italic_γ start_POSTSUBSCRIPT italic_C italic_X end_POSTSUBSCRIPT and γCX,psuperscriptsubscript𝛾𝐶𝑋𝑝\gamma_{CX,p}^{\prime}italic_γ start_POSTSUBSCRIPT italic_C italic_X , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

To apply the CVaR bounds, we run the circuits for p=1𝑝1p=1italic_p = 1 with 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT shots and for p=2𝑝2p=2italic_p = 2 with 107superscript10710^{7}10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT shots. This corresponds to 137 and 19 samples that remain to estimate the CVaR after sorting them and kee** the best α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and α2subscript𝛼2\alpha_{2}italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT fraction, respectively. The data confirm that CVaR¯αpsubscript¯CVaRsubscript𝛼𝑝\overline{\operatorname{CVaR}}_{\alpha_{p}}over¯ start_ARG roman_CVaR end_ARG start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT provides an upper bound (since MAXCUT is a maximization problem) to the noise-free expectation values, as predicted, see Fig. 1 and Tab. 1. The CVaR upper bound exceeds the noise free value by 3.9%percent3.93.9\%3.9 % for p=1𝑝1p=1italic_p = 1 and by 7.1%percent7.17.1\%7.1 % for p=2𝑝2p=2italic_p = 2.

We also use the noise-free expectation values obtained from the light-cone simulation to calibrate an α𝛼\alphaitalic_α such that the CVaR matches the noise-free result exactly, denoted by αpsuperscriptsubscript𝛼𝑝\alpha_{p}^{\prime}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. This allows us to derive an induced effective γCX,psuperscriptsubscript𝛾𝐶𝑋𝑝\gamma_{CX,p}^{\prime}italic_γ start_POSTSUBSCRIPT italic_C italic_X , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and compare it to the true γCXsubscript𝛾𝐶𝑋\gamma_{CX}italic_γ start_POSTSUBSCRIPT italic_C italic_X end_POSTSUBSCRIPT. We find that γCX,psuperscriptsubscript𝛾𝐶𝑋𝑝\gamma_{CX,p}^{\prime}italic_γ start_POSTSUBSCRIPT italic_C italic_X , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is quite stable for the different p𝑝pitalic_p and significantly smaller than γCXsubscript𝛾𝐶𝑋\gamma_{CX}italic_γ start_POSTSUBSCRIPT italic_C italic_X end_POSTSUBSCRIPT, see Tab. 1. This may imply that the observable of interest is not affected by all the errors that may occur. Crucially, this observation, may allow us to calibrate α𝛼\alphaitalic_α for a particular application and choose larger values than implied by the LF, e.g., by running circuits of similar structure but with known noise-free results. This may reduce the sampling overhead in certain scenarios while still achieving good results. However, in general, the lower/upper bounds proven in Sec. III will not hold anymore for α>1/γ𝛼1𝛾\alpha>1/\sqrt{\gamma}italic_α > 1 / square-root start_ARG italic_γ end_ARG.

Comparing the CVaR¯αpsubscript¯CVaRsubscript𝛼𝑝\overline{\operatorname{CVaR}}_{\alpha_{p}}over¯ start_ARG roman_CVaR end_ARG start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT and the best samples with the globally optimal solution, we find that they achieve approximation ratios of 0.7700.7700.7700.770 (CVaR) and 0.8390.8390.8390.839 (best sample) for p=1𝑝1p=1italic_p = 1, and 0.8660.8660.8660.866 (CVaR) and 0.8920.8920.8920.892 (best sample) for p=2𝑝2p=2italic_p = 2. All these numbers exceed the corresponding theoretical lower bounds of 0.6920.6920.6920.692 (p=1𝑝1p=1italic_p = 1) and 0.7560.7560.7560.756 (p=2𝑝2p=2italic_p = 2) discussed in Sec. IV.1.

V.2 QAOA on Hardware-efficient Higher-Order Ising Model with 127 variables

Refer to caption
Figure 2: Example heavy-hex hardware compatible 127127127127 qubit higher order Ising model. Nodes denote the linear terms, edges between nodes denote the quadratic terms, and the ovals encircling three neighboring nodes on the hardware graph denote hyper-edges. Polynomial (Ising model) coefficients of 11-1- 1 are denoted by red, and +11+1+ 1 are denoted by blue.

We now show results of running QAOA on higher-order spin glass models. Originally described in Refs. pelofske2023qavsqaoa ; pelofske2023short , these models are designed for a heavy-hex connectivity graph Chamberland_2020 of ibm_sherbrooke.

We define a minimization problem for the following cost Hamiltonian corresponding to a random coefficient spin glass problem with cubic terms and a connectivity graph that is defined to be compatible with an arbitrary heavy-hex lattice graph G=(V,E)𝐺𝑉𝐸G=(V,E)italic_G = ( italic_V , italic_E ), see Fig. 2:

H=𝐻absent\displaystyle H=italic_H = vVdvZv+(i,j)Edi,jZiZjsubscript𝑣𝑉subscript𝑑𝑣subscript𝑍𝑣subscript𝑖𝑗𝐸tensor-productsubscript𝑑𝑖𝑗subscript𝑍𝑖subscript𝑍𝑗\displaystyle\sum_{v\in V}d_{v}\cdot Z_{v}+\sum_{(i,j)\in E}d_{i,j}\cdot Z_{i}% \otimes Z_{j}∑ start_POSTSUBSCRIPT italic_v ∈ italic_V end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ⋅ italic_Z start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT ( italic_i , italic_j ) ∈ italic_E end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⋅ italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊗ italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
+lWdl,n1(l),n2(l)ZlZn1(l)Zn2(l).subscript𝑙𝑊tensor-productsubscript𝑑𝑙subscript𝑛1𝑙subscript𝑛2𝑙subscript𝑍𝑙subscript𝑍subscript𝑛1𝑙subscript𝑍subscript𝑛2𝑙\displaystyle+\sum_{l\in W}d_{l,n_{1}(l),n_{2}(l)}\cdot Z_{l}\otimes Z_{n_{1}(% l)}\otimes Z_{n_{2}(l)}.+ ∑ start_POSTSUBSCRIPT italic_l ∈ italic_W end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_l , italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_l ) , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_l ) end_POSTSUBSCRIPT ⋅ italic_Z start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⊗ italic_Z start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_l ) end_POSTSUBSCRIPT ⊗ italic_Z start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_l ) end_POSTSUBSCRIPT . (15)

As G𝐺Gitalic_G is a connected bipartite graph with vertices V={0,,n1}𝑉0𝑛1V=\{0,\ldots,n-1\}italic_V = { 0 , … , italic_n - 1 }, it is uniquely bipartitioned as V=V2V3𝑉square-unionsubscript𝑉2subscript𝑉3V=V_{2}\sqcup V_{3}italic_V = italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊔ italic_V start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT with EV2×V3𝐸subscript𝑉2subscript𝑉3E\subset V_{2}\times V_{3}italic_E ⊂ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_V start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, where Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT consists of vertices of degree at most i𝑖iitalic_i. With WV2𝑊subscript𝑉2W\subseteq V_{2}italic_W ⊆ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in (15), we denote the subset of vertices in V2subscript𝑉2V_{2}italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT of degree exactly 2222. Each node l𝑙litalic_l in W𝑊Witalic_W has two neighbors, denoted by n1(l)subscript𝑛1𝑙n_{1}(l)italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_l ) and n2(l)subscript𝑛2𝑙n_{2}(l)italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_l ). Thus dvsubscript𝑑𝑣d_{v}italic_d start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT, di,jsubscript𝑑𝑖𝑗d_{i,j}italic_d start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT, and dl,n1(l),n2(l)subscript𝑑𝑙subscript𝑛1𝑙subscript𝑛2𝑙d_{l,n_{1}(l),n_{2}(l)}italic_d start_POSTSUBSCRIPT italic_l , italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_l ) , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_l ) end_POSTSUBSCRIPT are the coefficients representing the random selection of the linear, quadratic, and cubic coefficients, respectively. The random coefficients are chosen from {+1,1}11\{+1,-1\}{ + 1 , - 1 } with equal probability. An example of such a random higher-order Ising model is in Fig. 2.

We use the qubits in V2subscript𝑉2V_{2}italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT to compute and uncompute parities into, for the ZZ𝑍𝑍ZZitalic_Z italic_Z and ZZZ𝑍𝑍𝑍ZZZitalic_Z italic_Z italic_Z terms in which they are contained. The unitaries eiγZZsuperscript𝑒𝑖𝛾𝑍𝑍e^{-i\gamma ZZ}italic_e start_POSTSUPERSCRIPT - italic_i italic_γ italic_Z italic_Z end_POSTSUPERSCRIPT and eiγZZZsuperscript𝑒𝑖𝛾𝑍𝑍𝑍e^{-i\gamma ZZZ}italic_e start_POSTSUPERSCRIPT - italic_i italic_γ italic_Z italic_Z italic_Z end_POSTSUPERSCRIPT are then realized with Rz(2γ)subscript𝑅𝑧2𝛾R_{z}(2\gamma)italic_R start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( 2 italic_γ )-rotations on these parity qubits. Computing and uncomputing parities needs 1+1111+11 + 1 and 2+2222+22 + 2 CNOT gates for the quadratic and cubic terms, respectively; however the CNOT gates for ZlZn1(l)subscript𝑍𝑙subscript𝑍subscript𝑛1𝑙Z_{l}Z_{n_{1}(l)}italic_Z start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_l ) end_POSTSUBSCRIPT and ZlZn2(l)subscript𝑍𝑙subscript𝑍subscript𝑛2𝑙Z_{l}Z_{n_{2}(l)}italic_Z start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_l ) end_POSTSUBSCRIPT can be subsumed into the CNOT gates for ZlZn1(l)Zn2(l)subscript𝑍𝑙subscript𝑍subscript𝑛1𝑙subscript𝑍subscript𝑛2𝑙Z_{l}Z_{n_{1}(l)}Z_{n_{2}(l)}italic_Z start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_l ) end_POSTSUBSCRIPT italic_Z start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_l ) end_POSTSUBSCRIPT.

Furthermore, G𝐺Gitalic_G as a bipartite graph of maximum degree 3 admits a 3-edge-coloring due to Kőnig’s line coloring theorem, meaning that these 2+2222+22 + 2 CNOT gates can be scheduled simultaneously for all terms in just 3+3333+33 + 3 non-overlap** layers pelofske2023qavsqaoa . Depth-p𝑝pitalic_p QAOA circuits for these problems thus have a CNOT depth of only 6p6𝑝6p6 italic_p, independent of the system size n𝑛nitalic_n. Further circuit details are given in Appendix G.

Leveraging parameter transfer of QAOA angles for problems with the same structure but varying numbers of qubits, allows us to obtain good angles for these 127127127127 qubit QAOA circuits for p=1,,5𝑝15p=1,\ldots,5italic_p = 1 , … , 5, without on-device variational learning heavy_hex_QAOA_parameter_transfer2023 . Additionally, we utilize converged MPS simulations with a bond dimension of χ=2048𝜒2048\chi=2048italic_χ = 2048 to verify that the fixed QAOA angles produce good expectation values heavy_hex_QAOA_parameter_transfer2023 , for all circuits. The hardware-compatible circuits are run on the ibm_sherbrooke device, again using staggered dynamic decoupling for error suppression, see Appendix F. The optimal solutions of the higher order Ising models were computed using CPLEX cplexv12 ; heavy_hex_QAOA_parameter_transfer2023 .

As before in Sec. V.1, we only have a small number of unique layers of CNOT gates. Since we want to cover a graph of degree three, we need at least three layers, see Appendix G, with 144 CNOT gates in total. The measured LF for the three layers is LF1=0.056926𝐿subscript𝐹10.056926LF_{1}=0.056926italic_L italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.056926, LF2=0.029630𝐿subscript𝐹20.029630LF_{2}=0.029630italic_L italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.029630 and LF3=0.167959𝐿subscript𝐹30.167959LF_{3}=0.167959italic_L italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = 0.167959. These fidelities are significantly smaller than for the 40 qubit circuits in Sec. V.1. The reason is that the qubits and gates on a 127-qubit devices are not all the same, there are always some better and some worse. For 40 qubits, we could select the best line of 40 qubits (see Appendix E), while for 127-qubits we had to use the whole chip. From this we can again compute CNOT fidelity CX=(LF1×LF2×LF3)1/144=(0.000283)1/144=0.944850subscript𝐶𝑋superscript𝐿subscript𝐹1𝐿subscript𝐹2𝐿subscript𝐹31144superscript0.00028311440.944850\mathcal{F}_{CX}=(LF_{1}\times LF_{2}\times LF_{3})^{1/144}=(0.000283)^{1/144}% =0.944850caligraphic_F start_POSTSUBSCRIPT italic_C italic_X end_POSTSUBSCRIPT = ( italic_L italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_L italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_L italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 144 end_POSTSUPERSCRIPT = ( 0.000283 ) start_POSTSUPERSCRIPT 1 / 144 end_POSTSUPERSCRIPT = 0.944850, EPLG=0.055150𝐸𝑃𝐿𝐺0.055150EPLG=0.055150italic_E italic_P italic_L italic_G = 0.055150, and γCX=1.120146subscript𝛾𝐶𝑋1.120146\gamma_{CX}=1.120146italic_γ start_POSTSUBSCRIPT italic_C italic_X end_POSTSUBSCRIPT = 1.120146. The results for evaluating the circuit on ibm_sherbrooke𝑖𝑏𝑚_𝑠𝑒𝑟𝑏𝑟𝑜𝑜𝑘𝑒ibm\_sherbrookeitalic_i italic_b italic_m _ italic_s italic_h italic_e italic_r italic_b italic_r italic_o italic_o italic_k italic_e, each with 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT shots, are provided in Fig. 3 and Tab. 2. With the significantly lower fidelities, the number of shots required to apply the analytic CVaR bounds are significantly higher and currently impractical to run. However, like in Sec. V.1, we see that the effective γCXsubscript𝛾𝐶𝑋\gamma_{CX}italic_γ start_POSTSUBSCRIPT italic_C italic_X end_POSTSUBSCRIPT is significantly smaller, even smaller than for the longer 40-qubit circuits. Further, we see that the noisy expectation values are still improving from p=1𝑝1p=1italic_p = 1 until p=4𝑝4p=4italic_p = 4 and only are starting to get worse for p=5𝑝5p=5italic_p = 5.

Last, we use bootstrap** to confirm the scaling of the CVaR variance with respect to α𝛼\alphaitalic_α. More precisely, we uniformly sample 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT values from the results collected using ibm_sherbrooke and estimate the CVaR for the five values of αpsuperscriptsubscript𝛼𝑝\alpha_{p}^{\prime}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT reported in Tab. 2. We repeat this 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT times to estimate the variance of the resulting CVaR estimators. The results are provided in Fig. 4 and show close agreement with the theory presented in Sec. III.

Refer to caption
Figure 3: QAOA results for sampling a random hardware-compatible higher-order Ising model (minimization combinatorial optimization problem) on 127-qubits: This figure shows the resulting distributions from 127-qubit circuits executed on ibm_sherbrooke for p=1,,5𝑝15p=1,\ldots,5italic_p = 1 , … , 5 (top to bottom). The cumulative distribution functions show the values of the resulting samples from 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT shots for every p𝑝pitalic_p. The vertical lines show the corresponding noisy expectation values (dashed blue), the noise-free expectation values evaluated using MPS simulation (cyan dashed-dotted), and the globally optimal solution equal to 188188-188- 188 (green solid). The title shows the fitted αpsuperscriptsubscript𝛼𝑝\alpha_{p}^{\prime}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT such that the CVaRαpsubscriptCVaRsuperscriptsubscript𝛼𝑝\operatorname{CVaR}_{\alpha_{p}^{\prime}}roman_CVaR start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT are equal to the noise-free expectation values (i.e. cyan dashed-dotted). The corresponding αpsuperscriptsubscript𝛼𝑝\alpha_{p}^{\prime}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are indicated by the horizontal dashed red line.
p𝑝pitalic_p #CNOT#CNOT\#\text{CNOT}# CNOT tr(ρH))\operatorname{tr}(\rho H))roman_tr ( italic_ρ italic_H ) ) tr(ρ~H))\operatorname{tr}(\widetilde{\rho}H))roman_tr ( over~ start_ARG italic_ρ end_ARG italic_H ) ) fbestsubscript𝑓bestf_{\text{best}}italic_f start_POSTSUBSCRIPT best end_POSTSUBSCRIPT γpsubscript𝛾𝑝\sqrt{\gamma_{p}}square-root start_ARG italic_γ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG αpsubscript𝛼𝑝\alpha_{p}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT αpsuperscriptsubscript𝛼𝑝\alpha_{p}^{\prime}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT γCX,psuperscriptsubscript𝛾𝐶𝑋𝑝\gamma_{CX,p}^{\prime}italic_γ start_POSTSUBSCRIPT italic_C italic_X , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT
1 288 -79.79 -64.54 -136 1.246×10071.246superscript10071.246\times 10^{07}1.246 × 10 start_POSTSUPERSCRIPT 07 end_POSTSUPERSCRIPT 8.026×10088.026superscript10088.026\times 10^{-08}8.026 × 10 start_POSTSUPERSCRIPT - 08 end_POSTSUPERSCRIPT 0.46020.46020.46020.4602 1.00541.00541.00541.0054
2 576 -109.35 -81.11 -154 1.553×10141.553superscript10141.553\times 10^{14}1.553 × 10 start_POSTSUPERSCRIPT 14 end_POSTSUPERSCRIPT 6.441×10156.441superscript10156.441\times 10^{-15}6.441 × 10 start_POSTSUPERSCRIPT - 15 end_POSTSUPERSCRIPT 0.13100.13100.13100.1310 1.00711.00711.00711.0071
3 864 -125.37 -86.97 -154 1.935×10211.935superscript10211.935\times 10^{21}1.935 × 10 start_POSTSUPERSCRIPT 21 end_POSTSUPERSCRIPT 5.169×10225.169superscript10225.169\times 10^{-22}5.169 × 10 start_POSTSUPERSCRIPT - 22 end_POSTSUPERSCRIPT 0.03050.03050.03050.0305 1.00811.00811.00811.0081
4 1152 -137.22 -88.46 -156 2.410×10282.410superscript10282.410\times 10^{28}2.410 × 10 start_POSTSUPERSCRIPT 28 end_POSTSUPERSCRIPT 4.149×10294.149superscript10294.149\times 10^{-29}4.149 × 10 start_POSTSUPERSCRIPT - 29 end_POSTSUPERSCRIPT 0.00590.00590.00590.0059 1.00901.00901.00901.0090
5 1440 -145.54 -85.78 -164 3.003×10353.003superscript10353.003\times 10^{35}3.003 × 10 start_POSTSUPERSCRIPT 35 end_POSTSUPERSCRIPT 3.330×10363.330superscript10363.330\times 10^{-36}3.330 × 10 start_POSTSUPERSCRIPT - 36 end_POSTSUPERSCRIPT 0.00110.00110.00110.0011 1.00961.00961.00961.0096
Table 2: QAOA results on 127-qubits: This table shows the different results for p=1,,5𝑝15p=1,\ldots,5italic_p = 1 , … , 5 when running QAOA on the introduced 127-qubit spin glass instance. It shows the number of CNOT gates per circuit, the noise-free and noisy expectation values, the best sampled values. Further, it shows the overall γpsubscript𝛾𝑝\sqrt{\gamma_{p}}square-root start_ARG italic_γ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG for the circuits and corresponding αpsubscript𝛼𝑝\alpha_{p}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT derived from the LF as well as the γCX,psuperscriptsubscript𝛾𝐶𝑋𝑝\gamma_{CX,p}^{\prime}italic_γ start_POSTSUBSCRIPT italic_C italic_X , italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and αpsuperscriptsubscript𝛼𝑝\alpha_{p}^{\prime}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT derived from calibrating the CVaR on the noise-free expectation values.
Refer to caption
Figure 4: Variance of CVaR estimates: We draw 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT uniform samples from the original data to estimate the CVaR for αpsuperscriptsubscript𝛼𝑝\alpha_{p}^{\prime}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, p=1,5𝑝15p=1,\ldots 5italic_p = 1 , … 5, cf. Tab. 2, and repeat this 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT times to get an estimate of the variance of the CVaR estimator. The dashed green line is fitted to the results for p=3,,5𝑝35p=3,\ldots,5italic_p = 3 , … , 5, and is very close to the predicted behavior of 𝒪(1/α)𝒪1𝛼\mathcal{O}(1/\alpha)caligraphic_O ( 1 / italic_α ).

VI Conclusion

We examined how hardware noise affects the quality of bit strings sampled from quantum circuits on noisy quantum computers. We proved and demonstrated that the noise can be compensated by increasing the number of samples inversely proportional to the circuit’s layer fidelity, or equivalently, proportional to γ𝛾\sqrt{\gamma}square-root start_ARG italic_γ end_ARG. This is considerably less than that required for error mitigation strategies like probabilistic error cancellation, which scales as γ2superscript𝛾2\gamma^{2}italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, however, to achieve unbiased estimators of expectation values instead of bounds. Furthermore, we proved that the Conditional Value at Risk provides bounds on noise-free expectation values using noisy samples, providing the theoretical foundation for CVaR as a loss function in variational algorithms, and thus, closing a gap in the literature. We also discussed the potential of this theory to benefit other algorithms, such as Quantum Support Vector Machines or approximate Variational Quantum Time Evolution.

Our primary focus was on errors occurring during circuit execution. However, other error sources, notably State Preparation and Measurement (SPAM) errors, also affect performance on noisy devices. The methodologies developed in this paper can be adapted to account for SPAM errors, either by increasing sampling overhead or applying other mitigation techniques, like statistical readout error mitigation. The latter may allow to mitigate certain errors without added sampling overhead but might require additional calibration circuits. Investigating the impact of SPAM errors remains an intriguing direction for future research.

Acknowledgments. The authors want to thank Almudena Carrera Vazquez, Julien Gacon, Youngseok Kim, David McKay, Diego Ristè, David Sutter, Kristan Temme, Minh Tran, and James Wootton for insightful discussions and recommendations to improve the theoretical and experimental results as well as the whole manuscript. Further, M.L. and S.W. acknowledge the support of the Swiss National Science Foundation, SNF grant No. 214919. E.P., A.B., and S.E  acknowledge the support of (i) the Beyond Moore’s Law thrust of the Advanced Simulation and Computing Program (NNSA ASC) at Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC, for the National Nuclear Security Administration of U.S. Department of Energy (Contract No. 89233218CNA000001), and (ii) LANL’s Institutional Computing program. LANL report LA-UR-23-33295.

Appendix A Assumption of Pauli noise

Within the theory of the paper we made the simplifying assumption of Pauli noise. This assumption is not given in general. Suppose a Clifford quantum circuit layer 𝒰()=UU𝒰𝑈superscript𝑈\mathcal{U}(\cdot)=U\cdot U^{\dagger}caligraphic_U ( ⋅ ) = italic_U ⋅ italic_U start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT on n𝑛nitalic_n qubits and its noisy version 𝒰~=𝒰Λ~𝒰𝒰Λ\widetilde{\mathcal{U}}=\mathcal{U}\circ\Lambdaover~ start_ARG caligraphic_U end_ARG = caligraphic_U ∘ roman_Λ. A more realistic description of the noise is given by

Λ(ρ)Λ𝜌\displaystyle\Lambda(\rho)roman_Λ ( italic_ρ ) =\displaystyle== iAiρAi,subscript𝑖subscript𝐴𝑖𝜌superscriptsubscript𝐴𝑖\displaystyle\sum_{i}A_{i}\rho A_{i}^{\dagger}\,,∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ρ italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , (16)

where the Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are Kraus operators nielsen_and_chuang , which leads to

𝒰~(ρ)~𝒰𝜌\displaystyle\widetilde{\mathcal{U}}(\rho)over~ start_ARG caligraphic_U end_ARG ( italic_ρ ) =\displaystyle== iAiUρUAi.subscript𝑖subscript𝐴𝑖𝑈𝜌superscript𝑈superscriptsubscript𝐴𝑖\displaystyle\sum_{i}A_{i}U\rho U^{\dagger}A_{i}^{\dagger}\,.∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_U italic_ρ italic_U start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT . (17)

Applying Pauli twirling knill_randomized_2008 ; dankert_exact_2009 ; magesan_scalable_2011 , i.e., averaging over 𝒰𝒰\mathcal{U}caligraphic_U conjugated by each element of the Pauli group on n𝑛nitalic_n qubits yields

𝒰~twirled(ρ)subscript~𝒰twirled𝜌\displaystyle\widetilde{\mathcal{U}}_{\text{twirled}}(\rho)over~ start_ARG caligraphic_U end_ARG start_POSTSUBSCRIPT twirled end_POSTSUBSCRIPT ( italic_ρ ) =\displaystyle== 14ni,jQjAiUPjρPjUAiQj,1superscript4𝑛subscript𝑖𝑗subscript𝑄𝑗subscript𝐴𝑖𝑈subscript𝑃𝑗𝜌subscript𝑃𝑗superscript𝑈superscriptsubscript𝐴𝑖subscript𝑄𝑗\displaystyle\frac{1}{4^{n}}\sum_{i,j}Q_{j}A_{i}UP_{j}\rho P_{j}U^{\dagger}A_{% i}^{\dagger}Q_{j}\,,divide start_ARG 1 end_ARG start_ARG 4 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_U italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ρ italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_U start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , (18)

for Paulis Pj,Qjsubscript𝑃𝑗subscript𝑄𝑗P_{j},Q_{j}italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with QjUPj=Usubscript𝑄𝑗𝑈subscript𝑃𝑗𝑈Q_{j}UP_{j}=Uitalic_Q start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_U italic_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_U for all j=1,,4n𝑗1superscript4𝑛j=1,\ldots,4^{n}italic_j = 1 , … , 4 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. This is known to translate the more general noise given in (17) on average to a Pauli noise model as given in (1). In practice, we do not enumerate all 4nsuperscript4𝑛4^{n}4 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT Paulis, but uniformly sample from them and apply a certain number of random Paulis to approximate the average.

Suppose now we have a noise model that-on average-looks like Pauli noise. Then, expectation values tr(ρH)tr𝜌𝐻\operatorname{tr}(\rho H)roman_tr ( italic_ρ italic_H ) will have the same value in case of a true Pauli noise model as well as in case of a twirled general model. That also holds if we set H=|xx|𝐻ket𝑥bra𝑥H=|x\rangle\!\langle x|italic_H = | italic_x ⟩ ⟨ italic_x |, i.e., we evaluate the probability of sampling |xket𝑥\ket{x}| start_ARG italic_x end_ARG ⟩. However, if we estimate the same sampling probability for the actual Pauli noise model and the twirled noise model the sampling probabilities also must be the same.

For the experiments in Sec. V, we omitted twirling. There are some special cases of noise models where we know the theory holds exactly the same. For instance, suppose stochastic noise wallman2016bounding A0Isimilar-tosubscript𝐴0𝐼A_{0}\sim\sqrt{I}italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ square-root start_ARG italic_I end_ARG and all other Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, for i>0𝑖0i>0italic_i > 0 are orthogonal to A0subscript𝐴0A_{0}italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Then, it can be easily seen that the probability of having no error is equal to the probability of the Pauli noise resulting after twirling, i.e., equal to 1/γ1𝛾1/\sqrt{\gamma}1 / square-root start_ARG italic_γ end_ARG. While we can always construct a noise channel with all orthogonal Kraus operators, it is not guaranteed that the identity is part of it. In general, we can only say that the probability of no error in the general noise model is less than or equal to 1/γ1𝛾1/\sqrt{\gamma}1 / square-root start_ARG italic_γ end_ARG Wallman_2016 ; wallman2016bounding .

However, it seems that the gap between the twirled and untwirled circuits is very small in the considered cases. We demonstrate the this by comparing the twirled and untwirled cases by comparing the resulting distributions. In Fig. 5 we show the experimental distributions when sampling from the ibm_sherbrooke device the same 127-qubits circuits discussed in Sec. V.2. This shows a close agreement with and without twirling.

We note that the observed distributions in Fig. 5 deviate slightly from those presented in Fig. 3. This is because in order to twirl the circuits, we need to insert additional single qubit gates, which contribute to a slightly deeper circuit, here, about 8% longer in the pulse schedule duration than the original circuits. In some cases this could be reduced by combining the twirling gates with other single qubit gates. However, if the additional gates are inserted, e.g., in between two CNOT gates, this is not possible. The circuits for the untwirled case have the same structure as the twirled case, except that the sampled twirling gates are constant, so that there is a fair comparison between the two due to the additional circuit duration.

We also note that the minimum values of the objective functions for the twirled case are lower than the untwirled case. However, since the opposite is true for the mean value of the objective function, we believe this may be due to sampling statistics, as in each of these cases the minimum objective value was only sampled only once. If we determine αpsuperscriptsubscript𝛼𝑝\alpha_{p}^{\prime}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT as before for each case, we find that the twirled and untwirled values agree well for each p𝑝pitalic_p, and are well within a standard deviation of each other (determined by bootstrap** the observed bitstrings). This is summarized in Tab. 3.

Refer to caption
Figure 5: QAOA results on 127 qubits run on ibm_sherbrooke. This figure shows the cumulative distribution functions (CDFs) for depths p=1,2𝑝12p=1,2italic_p = 1 , 2 with and without twirling. The orange lines correspond to the twirled circuits, and the blue lines correspond to the untwirled circuits. For each method, 100,000100000100,000100 , 000 shots were used in total. When twirling, we sampled 1,00010001,0001 , 000 random twirls and performed 100100100100 shots for each. The statistics of these distributions are summarized in Table 3.
p Twirling tr(ρH)tr𝜌𝐻\text{tr}(\rho H)tr ( italic_ρ italic_H ) tr(ρ~H)tr~𝜌𝐻\text{tr}(\widetilde{\rho}H)tr ( over~ start_ARG italic_ρ end_ARG italic_H ) fbestsubscript𝑓bestf_{\text{best}}italic_f start_POSTSUBSCRIPT best end_POSTSUBSCRIPT αpsuperscriptsubscript𝛼𝑝\alpha_{p}^{\prime}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT
1 No -79.8 -60.8 -128 0.147  (7.9%)
Yes -60.9 -144 0.152  (7.7%)
2 No -109.4 -74.9 -144 0.0202  (22.4%)
Yes -72.9 -148 0.0160  (25.7%)
Table 3: Values of the objective function obtained with and without twirling for QAOA depths p=1,2𝑝12p=1,2italic_p = 1 , 2, corresponding to the distributions shown in Fig. 5. The noise-free values are the expectation values of the observable obtained using classical MPS simulations (rounded to one decimal place) heavy_hex_QAOA_parameter_transfer2023 . The standard deviations for αpsuperscriptsubscript𝛼𝑝\alpha_{p}^{\prime}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT (shown as percent of the nominal value) are determined by bootstrap** over the observed bitstrings. We note that the αpsuperscriptsubscript𝛼𝑝\alpha_{p}^{\prime}italic_α start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT here are lower than those in Fig. 3 for the same reason that the observed values differ, as described in Appendix A. Nevertheless, the qualitative conclusions still hold, and the twirled and untwirled cases agree well.

Appendix B Probabilistic Error Cancellation & Sampling

In this section, we discuss how applying PEC berg2023probabilistic to quantum circuits affects the resulting sampling probabilities. PEC consists of two steps: learning the noise when running a quantum circuit on a particular quantum device, and then, mitigating the noise to get an unbiased estimator of an expectation value. Here, we assume we have learned the noise already and focus on the error mitigation. Given a noise model ΛΛ\Lambdaroman_Λ, PEC constructs a Quasiprobability Decomposition (QPD) to implement the inverse noise by combining multiple weighted quantum circuits.

In a QPD, a quantum operation 𝒰𝒰\mathcal{U}caligraphic_U is implemented as a linear combination of other (possibly noisy) operations isubscript𝑖\mathcal{E}_{i}caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=1,,M𝑖1𝑀i=1,\ldots,Mitalic_i = 1 , … , italic_M,

𝒰()𝒰\displaystyle\mathcal{U}(\cdot)caligraphic_U ( ⋅ ) =\displaystyle== i=1Maii(),superscriptsubscript𝑖1𝑀subscript𝑎𝑖subscript𝑖\displaystyle\sum_{i=1}^{M}a_{i}\mathcal{E}_{i}(\cdot)\,,∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) , (19)

where aisubscript𝑎𝑖a_{i}\in\mathbb{R}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R, 𝒰(X)=UXU𝒰𝑋𝑈𝑋superscript𝑈\mathcal{U}(X)=UXU^{\dagger}caligraphic_U ( italic_X ) = italic_U italic_X italic_U start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT, isubscript𝑖\mathcal{E}_{i}caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denote (noisy) operations, and i=1Mai=1superscriptsubscript𝑖1𝑀subscript𝑎𝑖1\sum_{i=1}^{M}a_{i}=1∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1. This has first been proposed in the context of error mitigation Temme_2017 , where 𝒰𝒰\mathcal{U}caligraphic_U is assumed to be a noise-free operation and isubscript𝑖\mathcal{E}_{i}caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are noisy operations that can be implemented on a noisy device. If this is being applied to multiple gates and qubits, the number of necessary operations M𝑀Mitalic_M explodes exponentially. Thus, instead of enumerating all of them, one rewrites (19) as

𝒰()𝒰\displaystyle\mathcal{U}(\cdot)caligraphic_U ( ⋅ ) =\displaystyle== γi=1Mpisii(),𝛾superscriptsubscript𝑖1𝑀subscript𝑝𝑖subscript𝑠𝑖subscript𝑖\displaystyle\gamma\sum_{i=1}^{M}p_{i}s_{i}\mathcal{E}_{i}(\cdot)\,,italic_γ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) , (20)

where γ=a11𝛾subscriptnorm𝑎11\gamma=\|a\|_{1}\geq 1italic_γ = ∥ italic_a ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ 1, pi=|ai|/γsubscript𝑝𝑖subscript𝑎𝑖𝛾p_{i}=|a_{i}|/\gammaitalic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | / italic_γ, and si=sign(ai)subscript𝑠𝑖signsubscript𝑎𝑖s_{i}=\text{sign}(a_{i})italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = sign ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), and samples from the probability distribution defined through pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Suppose we are interested in estimating H=tr(𝒰(ρ)H)expectation𝐻tr𝒰𝜌𝐻\braket{H}=\operatorname{tr}(\mathcal{U}(\rho)H)⟨ start_ARG italic_H end_ARG ⟩ = roman_tr ( caligraphic_U ( italic_ρ ) italic_H ) for some initial state ρ𝜌\rhoitalic_ρ and observable H𝐻Hitalic_H. Then, we can use the QPD to write

tr(𝒰(ρ)H)tr𝒰𝜌𝐻\displaystyle\operatorname{tr}(\mathcal{U}(\rho)H)roman_tr ( caligraphic_U ( italic_ρ ) italic_H ) =\displaystyle== γi=1Mpisitr(i(ρ)H).𝛾superscriptsubscript𝑖1𝑀subscript𝑝𝑖subscript𝑠𝑖trsubscript𝑖𝜌𝐻\displaystyle\gamma\sum_{i=1}^{M}p_{i}s_{i}\operatorname{tr}(\mathcal{E}_{i}(% \rho)H).italic_γ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_tr ( caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ρ ) italic_H ) . (21)

Thus, instead of enumerating all M𝑀Mitalic_M circuits, we can sample from pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and only evaluate the sampled circuits corresponding to i𝑖iitalic_i, to get an unbiased estimator for Hexpectation𝐻\braket{H}⟨ start_ARG italic_H end_ARG ⟩. However, the variance of this estimation is amplified by γ2superscript𝛾2\gamma^{2}italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, i.e., γ2superscript𝛾2\gamma^{2}italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-times more samples are needed than for the original noise-free circuit to achieve an estimate of the same accuracy. The sampling overhead γ2superscript𝛾2\gamma^{2}italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT grows exponentially in the number of qubits and depth of the circuit, and thus, can be prohibitively large for circuits beyond a certain circuit size and noise levels.

While PEC has only been considered for the estimation of expectation values, it also generates samples from every random circuit that is measured. However, we will show that this essentially amplifies the noise and increases the sampling overhead compared to the results presented within this paper. To this extent, we introduce the following mixed state introduced by PEC:

ρPECsubscript𝜌PEC\displaystyle\rho_{\text{PEC}}italic_ρ start_POSTSUBSCRIPT PEC end_POSTSUBSCRIPT =\displaystyle== i=1Mpii(ρ),superscriptsubscript𝑖1𝑀subscript𝑝𝑖subscript𝑖𝜌\displaystyle\sum_{i=1}^{M}p_{i}\mathcal{E}_{i}(\rho),∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_ρ ) , (22)

for some initial state ρ𝜌\rhoitalic_ρ. The state ρPECsubscript𝜌PEC\rho_{\text{PEC}}italic_ρ start_POSTSUBSCRIPT PEC end_POSTSUBSCRIPT is achieved by drop** the factor γ𝛾\gammaitalic_γ as well as the signs sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from (20). This allows us to state the following lemma.

Lemma 2.

Suppose a n𝑛nitalic_n-qubit state ρ=U|00|U𝜌𝑈ket0bra0superscript𝑈normal-†\rho=U|0\rangle\!\langle 0|U^{\dagger}italic_ρ = italic_U | 0 ⟩ ⟨ 0 | italic_U start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT, where U𝑈Uitalic_U is some unitary, with

tr(ρ|xx|)tr𝜌ket𝑥bra𝑥\displaystyle\operatorname{tr}(\rho|x\rangle\!\langle x|)roman_tr ( italic_ρ | italic_x ⟩ ⟨ italic_x | ) =\displaystyle== px0,subscript𝑝𝑥0\displaystyle p_{x}\geq 0,italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ≥ 0 , (23)

for a computational basis state |xket𝑥\ket{x}| start_ARG italic_x end_ARG ⟩, x{0,1}n𝑥superscript01𝑛x\in\{0,1\}^{n}italic_x ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.

Further, suppose that U𝑈Uitalic_U can be error-mitigated on a noisy device by using PEC with corresponding γ1𝛾1\gamma\geq 1italic_γ ≥ 1 and denote the resulting mixed state introduced in (22) by ρ𝑃𝐸𝐶subscript𝜌𝑃𝐸𝐶\rho_{\text{PEC}}italic_ρ start_POSTSUBSCRIPT PEC end_POSTSUBSCRIPT. Then, the probability of measuring |xket𝑥\ket{x}| start_ARG italic_x end_ARG ⟩ on the noisy devices using PEC is lower bounded by

tr(ρ𝑃𝐸𝐶|xx|)=px𝑃𝐸𝐶trsubscript𝜌𝑃𝐸𝐶ket𝑥bra𝑥superscriptsubscript𝑝𝑥𝑃𝐸𝐶\displaystyle\operatorname{tr}(\rho_{\text{PEC}}|x\rangle\!\langle x|)=p_{x}^{% \text{PEC}}roman_tr ( italic_ρ start_POSTSUBSCRIPT PEC end_POSTSUBSCRIPT | italic_x ⟩ ⟨ italic_x | ) = italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT PEC end_POSTSUPERSCRIPT \displaystyle\geq px/γ.subscript𝑝𝑥𝛾\displaystyle p_{x}/\gamma.italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT / italic_γ . (24)
Proof.

Consider the QPD resulting from PEC

𝒰()=i=1Maii().𝒰superscriptsubscript𝑖1𝑀subscript𝑎𝑖subscript𝑖\displaystyle\mathcal{U}(\cdot)=\sum_{i=1}^{M}a_{i}\mathcal{E}_{i}(\cdot)\,.caligraphic_U ( ⋅ ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) . (25)

Using (25) we can write

px=tr(ρ|xx|)=i=1Maitr(i(|00|)|xx|).subscript𝑝𝑥tr𝜌ket𝑥bra𝑥superscriptsubscript𝑖1𝑀subscript𝑎𝑖trsubscript𝑖ket0bra0ket𝑥bra𝑥\displaystyle p_{x}=\operatorname{tr}(\rho|x\rangle\!\langle x|)=\sum_{i=1}^{M% }a_{i}\operatorname{tr}(\mathcal{E}_{i}(|0\rangle\!\langle 0|)|x\rangle\!% \langle x|)\,.italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = roman_tr ( italic_ρ | italic_x ⟩ ⟨ italic_x | ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_tr ( caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( | 0 ⟩ ⟨ 0 | ) | italic_x ⟩ ⟨ italic_x | ) . (26)

By defining γ=a1𝛾subscriptnorm𝑎1\gamma=\|a\|_{1}italic_γ = ∥ italic_a ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, pi=|ai|/γsubscript𝑝𝑖subscript𝑎𝑖𝛾p_{i}=|a_{i}|/\gammaitalic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = | italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | / italic_γ, and si=sign(ai)subscript𝑠𝑖signsubscript𝑎𝑖s_{i}=\text{sign}(a_{i})italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = sign ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), we can rewrite (26) as

px=γi=1Mpisitr(i(|00|)|xx|).subscript𝑝𝑥𝛾superscriptsubscript𝑖1𝑀subscript𝑝𝑖subscript𝑠𝑖trsubscript𝑖ket0bra0ket𝑥bra𝑥\displaystyle p_{x}=\gamma\sum_{i=1}^{M}p_{i}s_{i}\operatorname{tr}(\mathcal{E% }_{i}(|0\rangle\!\langle 0|)|x\rangle\!\langle x|)\,.italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = italic_γ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_tr ( caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( | 0 ⟩ ⟨ 0 | ) | italic_x ⟩ ⟨ italic_x | ) . (27)

Further, sitr(i(|00|)|xx|)subscript𝑠𝑖trsubscript𝑖ket0bra0ket𝑥bra𝑥s_{i}\operatorname{tr}(\mathcal{E}_{i}(|0\rangle\!\langle 0|)|x\rangle\!% \langle x|)italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_tr ( caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( | 0 ⟩ ⟨ 0 | ) | italic_x ⟩ ⟨ italic_x | ) allows us to define a random variable Yi{1,0,+1}subscript𝑌𝑖101Y_{i}\in\{-1,0,+1\}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { - 1 , 0 , + 1 } that equals ±1plus-or-minus1\pm 1± 1 if we measure i(|00|)subscript𝑖ket0bra0\mathcal{E}_{i}(|0\rangle\!\langle 0|)caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( | 0 ⟩ ⟨ 0 | ) and obtain |xket𝑥\ket{x}| start_ARG italic_x end_ARG ⟩, where the sign is determined by sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and 00 otherwise. The random variable Yisubscript𝑌𝑖Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT satisfies 𝔼[Yi]=sitr(i(|00|)|xx|)𝔼delimited-[]subscript𝑌𝑖subscript𝑠𝑖trsubscript𝑖ket0bra0ket𝑥bra𝑥\mathbb{E}[Y_{i}]=s_{i}\operatorname{tr}(\mathcal{E}_{i}(|0\rangle\!\langle 0|% )|x\rangle\!\langle x|)blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_tr ( caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( | 0 ⟩ ⟨ 0 | ) | italic_x ⟩ ⟨ italic_x | ). We denote the probabilities of Yisubscript𝑌𝑖Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT taking the values 1,0,+1101-1,0,+1- 1 , 0 , + 1 by qi1,qi0,qi+10superscriptsubscript𝑞𝑖1superscriptsubscript𝑞𝑖0superscriptsubscript𝑞𝑖10q_{i}^{-1},q_{i}^{0},q_{i}^{+1}\geq 0italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + 1 end_POSTSUPERSCRIPT ≥ 0, respectively. Note that by construction, for each i𝑖iitalic_i only one of qi1,qi+1superscriptsubscript𝑞𝑖1superscriptsubscript𝑞𝑖1q_{i}^{-1},q_{i}^{+1}italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + 1 end_POSTSUPERSCRIPT can be larger than zero.

In addition, let the probabilities pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT define a random variable I{1,,M}𝐼1𝑀I\in\{1,\ldots,M\}italic_I ∈ { 1 , … , italic_M }. Then, by the law of total expectation, we get

γ𝔼[YI]𝛾𝔼delimited-[]subscript𝑌𝐼\displaystyle\gamma\mathbb{E}[Y_{I}]italic_γ blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ] =γi=1M𝔼[Yi|i][i]absent𝛾superscriptsubscript𝑖1𝑀𝔼delimited-[]conditionalsubscript𝑌𝑖𝑖delimited-[]𝑖\displaystyle=\gamma\sum_{i=1}^{M}\mathbb{E}[Y_{i}|i]\mathbb{P}[i]= italic_γ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT blackboard_E [ italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_i ] blackboard_P [ italic_i ] (28)
=γi=1Mpisitr(i(|00|)|xx|)absent𝛾superscriptsubscript𝑖1𝑀subscript𝑝𝑖subscript𝑠𝑖trsubscript𝑖ket0bra0ket𝑥bra𝑥\displaystyle=\gamma\sum_{i=1}^{M}p_{i}s_{i}\operatorname{tr}(\mathcal{E}_{i}(% |0\rangle\!\langle 0|)|x\rangle\!\langle x|)= italic_γ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_tr ( caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( | 0 ⟩ ⟨ 0 | ) | italic_x ⟩ ⟨ italic_x | ) (29)
=px.absentsubscript𝑝𝑥\displaystyle=p_{x}\,.= italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT . (30)

This can be rewritten as

i=1Mpi(qi+1qi1)superscriptsubscript𝑖1𝑀subscript𝑝𝑖superscriptsubscript𝑞𝑖1superscriptsubscript𝑞𝑖1\displaystyle\sum_{i=1}^{M}p_{i}\left(q_{i}^{+1}-q_{i}^{-1}\right)∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + 1 end_POSTSUPERSCRIPT - italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) =\displaystyle== pxγ.subscript𝑝𝑥𝛾\displaystyle\frac{p_{x}}{\gamma}\,.divide start_ARG italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG start_ARG italic_γ end_ARG . (31)

The total probability to measure |xket𝑥\ket{x}| start_ARG italic_x end_ARG ⟩ when applying PEC, independent of the sign of YIsubscript𝑌𝐼Y_{I}italic_Y start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT, is then given by

i=1Mpi(qi+1+qi1)superscriptsubscript𝑖1𝑀subscript𝑝𝑖superscriptsubscript𝑞𝑖1superscriptsubscript𝑞𝑖1\displaystyle\sum_{i=1}^{M}p_{i}\left(q_{i}^{+1}+q_{i}^{-1}\right)∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + 1 end_POSTSUPERSCRIPT + italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) \displaystyle\geq pxγ,subscript𝑝𝑥𝛾\displaystyle\frac{p_{x}}{\gamma},divide start_ARG italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG start_ARG italic_γ end_ARG , (32)

where the lower bound follows immediately from (31), and the right-hand-side is exactly the probability of measuring |xket𝑥\ket{x}| start_ARG italic_x end_ARG ⟩ for state ρPECsubscript𝜌PEC\rho_{\text{PEC}}italic_ρ start_POSTSUBSCRIPT PEC end_POSTSUBSCRIPT. ∎

If we compare the result from Lemma 2 with the lower bound presented in (2), we see that PEC implies the squared overhead compared to direct sampling. Further, this implies that CVaR-based approaches may significantly reduce the overhead to achieve insightful results, particularly when combined with problem structure to filter noisy samples.

Appendix C Variance of Estimating the CVaR

In this section, we present a short exposition on how to estimate CVaR. We will first state the following lemma.

Lemma 3.

Let X1,,Xnsubscript𝑋1normal-…subscript𝑋𝑛X_{1},\dots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be i.i.d.  copies of X𝑋Xitalic_X (with X𝑋Xitalic_X integrable) and let X(1),,X(n)subscript𝑋1normal-…subscript𝑋𝑛X_{(1)},\dots,X_{(n)}italic_X start_POSTSUBSCRIPT ( 1 ) end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT be their order statistic. For α(0,1]𝛼01\alpha\in(0,1]italic_α ∈ ( 0 , 1 ] let En=(X(1)++X(αn))/αnsubscript𝐸𝑛subscript𝑋1normal-⋯subscript𝑋𝛼𝑛𝛼𝑛E_{n}=(X_{(1)}+\cdots+X_{(\lfloor\alpha n\rfloor)})/\lfloor\alpha n\rflooritalic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ( italic_X start_POSTSUBSCRIPT ( 1 ) end_POSTSUBSCRIPT + ⋯ + italic_X start_POSTSUBSCRIPT ( ⌊ italic_α italic_n ⌋ ) end_POSTSUBSCRIPT ) / ⌊ italic_α italic_n ⌋. Then

𝔼[En]CVaRα(X)as n.formulae-sequence𝔼delimited-[]subscript𝐸𝑛subscriptCVaR𝛼𝑋as 𝑛\displaystyle\mathbb{E}[E_{n}]\to\operatorname{CVaR}_{\alpha}(X)\quad\text{as % }n\to\infty\,.blackboard_E [ italic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] → roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_X ) as italic_n → ∞ .

If X𝑋Xitalic_X is square integrable and FX(xα)=αsubscript𝐹𝑋subscript𝑥𝛼𝛼F_{X}(x_{\alpha})=\alphaitalic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) = italic_α,

n(EnCVaRα(X))N(0,CVaRvα(X))𝑛subscript𝐸𝑛subscriptCVaR𝛼𝑋𝑁0subscriptCVaRv𝛼𝑋\displaystyle\sqrt{n}(E_{n}-\operatorname{CVaR}_{\alpha}(X))\to N(0,% \operatorname{CVaRv}_{\alpha}(X))square-root start_ARG italic_n end_ARG ( italic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_X ) ) → italic_N ( 0 , roman_CVaRv start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_X ) )

in distribution as nnormal-→𝑛n\to\inftyitalic_n → ∞ where here CVaRvα(X):=α1Var[XXxα]assignsubscriptnormal-CVaRv𝛼𝑋superscript𝛼1normal-Varconditional𝑋𝑋subscript𝑥𝛼\operatorname{CVaRv}_{\alpha}(X):=\alpha^{-1}\operatorname{Var}[X\mid X\leq x_% {\alpha}]roman_CVaRv start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_X ) := italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Var [ italic_X ∣ italic_X ≤ italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ] is the limiting variance.

To estimate CVaR¯α(X)subscript¯CVaR𝛼𝑋\overline{\operatorname{CVaR}}_{\alpha}(X)over¯ start_ARG roman_CVaR end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_X ), we use the estimator E¯n=(X(nαn+1)++X(n))/αnsubscript¯𝐸𝑛subscript𝑋𝑛𝛼𝑛1subscript𝑋𝑛𝛼𝑛\overline{E}_{n}=(X_{(n-\lfloor\alpha n\rfloor+1)}+\cdots+X_{(n)})/\lfloor% \alpha n\rfloorover¯ start_ARG italic_E end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ( italic_X start_POSTSUBSCRIPT ( italic_n - ⌊ italic_α italic_n ⌋ + 1 ) end_POSTSUBSCRIPT + ⋯ + italic_X start_POSTSUBSCRIPT ( italic_n ) end_POSTSUBSCRIPT ) / ⌊ italic_α italic_n ⌋ and obtain analogous results.

Proof.

Recall FX(x)=[Xx]subscript𝐹𝑋𝑥delimited-[]𝑋𝑥F_{X}(x)=\mathbb{P}[X\leq x]italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) = blackboard_P [ italic_X ≤ italic_x ] and define FX(x)=[X<x]subscript𝐹𝑋limit-from𝑥delimited-[]𝑋𝑥F_{X}(x-)=\mathbb{P}[X<x]italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x - ) = blackboard_P [ italic_X < italic_x ]. We make the following definitions for (left limits) of empirical cumulative distribution functions:

F^n(x)subscript^𝐹𝑛𝑥\displaystyle\hat{F}_{n}(x)over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) =#{in:Xix}/n,absent#conditional-set𝑖𝑛subscript𝑋𝑖𝑥𝑛\displaystyle=\#\{i\leq n\colon X_{i}\leq x\}/n\,,= # { italic_i ≤ italic_n : italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_x } / italic_n ,
F^n(x)subscript^𝐹𝑛limit-from𝑥\displaystyle\hat{F}_{n}(x-)over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x - ) =#{in:Xi<x}/n.absent#conditional-set𝑖𝑛subscript𝑋𝑖𝑥𝑛\displaystyle=\#\{i\leq n\colon X_{i}<x\}/n\,.= # { italic_i ≤ italic_n : italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_x } / italic_n .

Also let ΔFX(x)=FX(x)FX(x)Δsubscript𝐹𝑋𝑥subscript𝐹𝑋𝑥subscript𝐹𝑋limit-from𝑥\Delta F_{X}(x)=F_{X}(x)-F_{X}(x-)roman_Δ italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) = italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) - italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x - ) and ΔF^n(x)=F^n(x)F^n(x)Δsubscript^𝐹𝑛𝑥subscript^𝐹𝑛𝑥subscript^𝐹𝑛limit-from𝑥\Delta\hat{F}_{n}(x)=\hat{F}_{n}(x)-\hat{F}_{n}(x-)roman_Δ over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) = over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) - over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x - ). The key observation is that

En=1αni=1nXimin{(αnnF^n(Xi))+nΔF^n(Xi),1}.subscript𝐸𝑛1𝛼𝑛superscriptsubscript𝑖1𝑛subscript𝑋𝑖subscript𝛼𝑛𝑛subscript^𝐹𝑛limit-fromsubscript𝑋𝑖𝑛Δsubscript^𝐹𝑛subscript𝑋𝑖1\displaystyle E_{n}=\frac{1}{\lfloor\alpha n\rfloor}\sum_{i=1}^{n}X_{i}\min% \left\{\frac{(\lfloor\alpha n\rfloor-n\hat{F}_{n}(X_{i}-))_{+}}{n\Delta\hat{F}% _{n}(X_{i})},1\right\}\,.italic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG ⌊ italic_α italic_n ⌋ end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_min { divide start_ARG ( ⌊ italic_α italic_n ⌋ - italic_n over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG start_ARG italic_n roman_Δ over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG , 1 } .

Indeed, any x𝑥x\in\mathbb{R}italic_x ∈ blackboard_R will appear in the sum defining αnEn𝛼𝑛subscript𝐸𝑛\lfloor\alpha n\rfloor E_{n}⌊ italic_α italic_n ⌋ italic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT precisely min{(αnnF^n(x))+,nΔF^n(x)}subscript𝛼𝑛𝑛subscript^𝐹𝑛limit-from𝑥𝑛Δsubscript^𝐹𝑛𝑥\min\{(\lfloor\alpha n\rfloor-n\hat{F}_{n}(x-))_{+},n\Delta\hat{F}_{n}(x)\}roman_min { ( ⌊ italic_α italic_n ⌋ - italic_n over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x - ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT , italic_n roman_Δ over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) } times; the nΔF^n(Xi)𝑛Δsubscript^𝐹𝑛subscript𝑋𝑖n\Delta\hat{F}_{n}(X_{i})italic_n roman_Δ over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) in the denominator above takes care of overcounting. Now

𝔼[En]=nαn𝔼[X1min{(αnnF^n(X1))+nΔF^n(X1),1}]𝔼delimited-[]subscript𝐸𝑛𝑛𝛼𝑛𝔼delimited-[]subscript𝑋1subscript𝛼𝑛𝑛subscript^𝐹𝑛limit-fromsubscript𝑋1𝑛Δsubscript^𝐹𝑛subscript𝑋11\displaystyle\mathbb{E}[E_{n}]=\frac{n}{\lfloor\alpha n\rfloor}\mathbb{E}\left% [X_{1}\min\left\{\frac{(\lfloor\alpha n\rfloor-n\hat{F}_{n}(X_{1}-))_{+}}{n% \Delta\hat{F}_{n}(X_{1})},1\right\}\right]blackboard_E [ italic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] = divide start_ARG italic_n end_ARG start_ARG ⌊ italic_α italic_n ⌋ end_ARG blackboard_E [ italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_min { divide start_ARG ( ⌊ italic_α italic_n ⌋ - italic_n over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG start_ARG italic_n roman_Δ over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG , 1 } ]
=nαn𝔼[An(X1)]absent𝑛𝛼𝑛𝔼delimited-[]subscript𝐴𝑛subscript𝑋1\displaystyle=\frac{n}{\lfloor\alpha n\rfloor}\mathbb{E}\left[A_{n}(X_{1})\right]= divide start_ARG italic_n end_ARG start_ARG ⌊ italic_α italic_n ⌋ end_ARG blackboard_E [ italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ]

where

An(x):=x𝔼[min{(αnnF^n1(x))+1+nΔF^n1(x),1}].assignsubscript𝐴𝑛𝑥𝑥𝔼delimited-[]subscript𝛼𝑛𝑛subscript^𝐹𝑛1limit-from𝑥1𝑛Δsubscript^𝐹𝑛1𝑥1\displaystyle A_{n}(x):=x\mathbb{E}\left[\min\left\{\frac{(\lfloor\alpha n% \rfloor-n\hat{F}_{n-1}(x-))_{+}}{1+n\Delta\hat{F}_{n-1}(x)},1\right\}\right]\,.italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) := italic_x blackboard_E [ roman_min { divide start_ARG ( ⌊ italic_α italic_n ⌋ - italic_n over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ( italic_x - ) ) start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_n roman_Δ over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ( italic_x ) end_ARG , 1 } ] .

The first equality above follows from the linearity of the expectation and the i.i.d. property of X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and the second equality follows from conditioning on X1subscript𝑋1X_{1}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Using the strong law of large numbers we have (F^n(x),F^n(x),ΔF^n(x))(FX(x),FX(x),ΔFX(x))subscript^𝐹𝑛𝑥subscript^𝐹𝑛limit-from𝑥Δsubscript^𝐹𝑛𝑥subscript𝐹𝑋𝑥subscript𝐹𝑋limit-from𝑥Δsubscript𝐹𝑋𝑥(\hat{F}_{n}(x),\hat{F}_{n}(x-),\Delta\hat{F}_{n}(x))\to(F_{X}(x),F_{X}(x-),% \Delta F_{X}(x))( over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) , over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x - ) , roman_Δ over^ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) ) → ( italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) , italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x - ) , roman_Δ italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) ) a.s. as n𝑛n\to\inftyitalic_n → ∞. By separately considering the ΔFX(x)=0Δsubscript𝐹𝑋𝑥0\Delta F_{X}(x)=0roman_Δ italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) = 0 and ΔFX(x)>0Δsubscript𝐹𝑋𝑥0\Delta F_{X}(x)>0roman_Δ italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) > 0 cases we get

An(x)subscript𝐴𝑛𝑥\displaystyle A_{n}(x)italic_A start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x ) x1(FX(x)<α)absent𝑥1subscript𝐹𝑋𝑥𝛼\displaystyle\to x\cdot 1(F_{X}(x)<\alpha)→ italic_x ⋅ 1 ( italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) < italic_α )
+xαFX(x)ΔFX(x) 1(α(FX(x),FX(x)))𝑥𝛼subscript𝐹𝑋limit-from𝑥Δsubscript𝐹𝑋𝑥1𝛼subscript𝐹𝑋limit-from𝑥subscript𝐹𝑋𝑥\displaystyle\qquad+x\frac{\alpha-F_{X}(x-)}{\Delta F_{X}(x)}\,1(\alpha\in(F_{% X}(x-),F_{X}(x)))+ italic_x divide start_ARG italic_α - italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x - ) end_ARG start_ARG roman_Δ italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) end_ARG 1 ( italic_α ∈ ( italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x - ) , italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) ) )

as n𝑛n\to\inftyitalic_n → ∞ unless α=FX(x)=FX(x)𝛼subscript𝐹𝑋𝑥subscript𝐹𝑋limit-from𝑥\alpha=F_{X}(x)=F_{X}(x-)italic_α = italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) = italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x - ); however we have [α=FX(X1)=FX(X1)]=0delimited-[]𝛼subscript𝐹𝑋subscript𝑋1subscript𝐹𝑋limit-fromsubscript𝑋10\mathbb{P}[\alpha=F_{X}(X_{1})=F_{X}(X_{1}-)]=0blackboard_P [ italic_α = italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - ) ] = 0 so this case does not matter to evaluate the limit of 𝔼[En]𝔼delimited-[]subscript𝐸𝑛\mathbb{E}[E_{n}]blackboard_E [ italic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ]. Thus by dominated convergence

𝔼[En]𝔼delimited-[]subscript𝐸𝑛\displaystyle\mathbb{E}[E_{n}]blackboard_E [ italic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] α1𝔼[X1;FX(X1)<α]absentsuperscript𝛼1𝔼delimited-[]subscript𝑋1subscript𝐹𝑋subscript𝑋1𝛼\displaystyle\to\alpha^{-1}\mathbb{E}[X_{1};F_{X}(X_{1})<\alpha]→ italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT blackboard_E [ italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < italic_α ]
+x:α(FX(x),FX(x))x(1α1FX(x))subscript:𝑥𝛼subscript𝐹𝑋limit-from𝑥subscript𝐹𝑋𝑥𝑥1superscript𝛼1subscript𝐹𝑋limit-from𝑥\displaystyle\qquad+\sum_{x\colon\alpha\in(F_{X}(x-),F_{X}(x))}x(1-\alpha^{-1}% F_{X}(x-))+ ∑ start_POSTSUBSCRIPT italic_x : italic_α ∈ ( italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x - ) , italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) ) end_POSTSUBSCRIPT italic_x ( 1 - italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x - ) )
=CVaRα(X)absentsubscriptCVaR𝛼𝑋\displaystyle=\operatorname{CVaR}_{\alpha}(X)= roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_X )

as n𝑛n\to\inftyitalic_n → ∞. The second claim on the central limit theorem is a special case of cvarestimator_clt . ∎

Let us make the following remark on monotonicity: If ϕ::italic-ϕ\phi\colon\mathbb{R}\to\mathbb{R}italic_ϕ : blackboard_R → blackboard_R is non-decreasing and ϕ(X)italic-ϕ𝑋\phi(X)italic_ϕ ( italic_X ) is integrable, then

00\displaystyle 0 𝔼[(ϕ(X)ϕ(X))(1(Xx)1(Xx))]absent𝔼delimited-[]italic-ϕ𝑋italic-ϕsuperscript𝑋1𝑋𝑥1superscript𝑋𝑥\displaystyle\geq\mathbb{E}[(\phi(X)-\phi(X^{\prime}))(1(X\leq x)-1(X^{\prime}% \leq x))]≥ blackboard_E [ ( italic_ϕ ( italic_X ) - italic_ϕ ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ( 1 ( italic_X ≤ italic_x ) - 1 ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_x ) ) ]
=2𝔼[ϕ(X);Xx]2𝔼[ϕ(X)][Xx].absent2𝔼delimited-[]italic-ϕ𝑋𝑋𝑥2𝔼delimited-[]italic-ϕ𝑋delimited-[]𝑋𝑥\displaystyle=2\cdot\mathbb{E}[\phi(X);X\leq x]-2\cdot\mathbb{E}[\phi(X)]% \mathbb{P}[X\leq x]\,.= 2 ⋅ blackboard_E [ italic_ϕ ( italic_X ) ; italic_X ≤ italic_x ] - 2 ⋅ blackboard_E [ italic_ϕ ( italic_X ) ] blackboard_P [ italic_X ≤ italic_x ] .

By applying this to ϕ(x)=xitalic-ϕ𝑥𝑥\phi(x)=xitalic_ϕ ( italic_x ) = italic_x and x=xα𝑥subscript𝑥𝛼x=x_{\alpha}italic_x = italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT we see that CVaRα(X)𝔼[X]subscriptCVaR𝛼𝑋𝔼delimited-[]𝑋\operatorname{CVaR}_{\alpha}(X)\leq\mathbb{E}[X]roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_X ) ≤ blackboard_E [ italic_X ]. Furthermore, by replacing X𝑋Xitalic_X by a random variable sampled from the law of X𝑋Xitalic_X conditioned on Xxα𝑋subscript𝑥superscript𝛼X\leq x_{\alpha^{\prime}}italic_X ≤ italic_x start_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for α>αsuperscript𝛼𝛼\alpha^{\prime}>\alphaitalic_α start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > italic_α we can deduce that CVaRα(X)subscriptCVaR𝛼𝑋\operatorname{CVaR}_{\alpha}(X)roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_X ) is non-decreasing in α𝛼\alphaitalic_α. Much more crudely, we can bound CVaRvα(X)α1𝔼[X2]/[Xxα]𝔼[X2]/α2subscriptCVaRv𝛼𝑋superscript𝛼1𝔼delimited-[]superscript𝑋2delimited-[]𝑋subscript𝑥𝛼𝔼delimited-[]superscript𝑋2superscript𝛼2\operatorname{CVaRv}_{\alpha}(X)\leq\alpha^{-1}\mathbb{E}[X^{2}]/\mathbb{P}[X% \leq x_{\alpha}]\leq\mathbb{E}[X^{2}]/\alpha^{2}roman_CVaRv start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_X ) ≤ italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT blackboard_E [ italic_X start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] / blackboard_P [ italic_X ≤ italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ] ≤ blackboard_E [ italic_X start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] / italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

In the following, we analyze behavior of the limiting distribution of the estimator Ensubscript𝐸𝑛E_{n}italic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in some concrete cases.

In the case where X𝑋Xitalic_X has a Bernoulli distribution with success probability p𝑝pitalic_p, we observe that E¯nsubscript¯𝐸𝑛\overline{E}_{n}over¯ start_ARG italic_E end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT has the same distribution as min{Bn/αn,1}subscript𝐵𝑛𝛼𝑛1\min\{B_{n}/\lfloor\alpha n\rfloor,1\}roman_min { italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT / ⌊ italic_α italic_n ⌋ , 1 } where Bnsubscript𝐵𝑛B_{n}italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is Binomial distributed with parameter (n,p)𝑛𝑝(n,p)( italic_n , italic_p ). An application of the central limit theorem thus yields

n(Enmin{p/α,1})𝑛subscript𝐸𝑛𝑝𝛼1\displaystyle\sqrt{n}(E_{n}-\min\{p/\alpha,1\})square-root start_ARG italic_n end_ARG ( italic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - roman_min { italic_p / italic_α , 1 } )
{α1p(1pN:α>p(1p)p1N1(N0):α=p,0:α<p.\displaystyle\to\begin{cases}\alpha^{-1}\sqrt{p(1-p}N&\colon\alpha>p\\ \sqrt{(1-p)p^{-1}}\,N\cdot 1(N\geq 0)&\colon\alpha=p,\\ 0&\colon\alpha<p\,.\end{cases}→ { start_ROW start_CELL italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT square-root start_ARG italic_p ( 1 - italic_p end_ARG italic_N end_CELL start_CELL : italic_α > italic_p end_CELL end_ROW start_ROW start_CELL square-root start_ARG ( 1 - italic_p ) italic_p start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG italic_N ⋅ 1 ( italic_N ≥ 0 ) end_CELL start_CELL : italic_α = italic_p , end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL : italic_α < italic_p . end_CELL end_ROW

in distribution as n𝑛n\to\inftyitalic_n → ∞ where N𝑁Nitalic_N is a standard normal random variable.

To analyze the case where NN(0,1)similar-to𝑁𝑁01N\sim N(0,1)italic_N ∼ italic_N ( 0 , 1 ), it will be useful to recall the following asymptotic expansion (nist_dlmf, , (8.11(i))) of incomplete Gamma functions:

Γ(a,y)Γ𝑎𝑦\displaystyle\Gamma(a,y)roman_Γ ( italic_a , italic_y ) :=ysa1es𝑑sassignabsentsuperscriptsubscript𝑦superscript𝑠𝑎1superscript𝑒𝑠differential-d𝑠\displaystyle:=\int_{y}^{\infty}s^{a-1}e^{-s}\,ds:= ∫ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_s start_POSTSUPERSCRIPT italic_a - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_s end_POSTSUPERSCRIPT italic_d italic_s
=ya1ey(k=0n1(a1)(ak)yk+O(yn))absentsuperscript𝑦𝑎1superscript𝑒𝑦superscriptsubscript𝑘0𝑛1𝑎1𝑎𝑘superscript𝑦𝑘𝑂superscript𝑦𝑛\displaystyle=y^{a-1}e^{-y}\left(\sum_{k=0}^{n-1}\frac{(a-1)\cdots(a-k)}{y^{k}% }+O(y^{-n})\right)= italic_y start_POSTSUPERSCRIPT italic_a - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_y end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT divide start_ARG ( italic_a - 1 ) ⋯ ( italic_a - italic_k ) end_ARG start_ARG italic_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_y start_POSTSUPERSCRIPT - italic_n end_POSTSUPERSCRIPT ) )

as y𝑦y\to\inftyitalic_y → ∞ for any fixed n1𝑛1n\geq 1italic_n ≥ 1 and a>0𝑎0a>0italic_a > 0. In particular as x𝑥x\to\inftyitalic_x → ∞,

Γ(1/2,x2/2)2Γ12superscript𝑥222\displaystyle\frac{\Gamma(1/2,x^{2}/2)}{\sqrt{2}}divide start_ARG roman_Γ ( 1 / 2 , italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ) end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG =ex2/2x(11x2+3x4+O(x6)),absentsuperscript𝑒superscript𝑥22𝑥11superscript𝑥23superscript𝑥4𝑂superscript𝑥6\displaystyle=\frac{e^{-x^{2}/2}}{x}\left(1-\frac{1}{x^{2}}+\frac{3}{x^{4}}+O(% x^{-6})\right),= divide start_ARG italic_e start_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_x end_ARG ( 1 - divide start_ARG 1 end_ARG start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 3 end_ARG start_ARG italic_x start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_x start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT ) ) ,
2Γ(1/2,x2/2)2Γ12superscript𝑥22\displaystyle\frac{\sqrt{2}}{\Gamma(1/2,x^{2}/2)}divide start_ARG square-root start_ARG 2 end_ARG end_ARG start_ARG roman_Γ ( 1 / 2 , italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ) end_ARG =xex2/2(1+1x22x4+O(x6)).absent𝑥superscript𝑒superscript𝑥2211superscript𝑥22superscript𝑥4𝑂superscript𝑥6\displaystyle=xe^{x^{2}/2}\left(1+\frac{1}{x^{2}}-\frac{2}{x^{4}}+O(x^{-6})% \right).= italic_x italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 end_POSTSUPERSCRIPT ( 1 + divide start_ARG 1 end_ARG start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG 2 end_ARG start_ARG italic_x start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_x start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT ) ) .

Let xα=FN1(α)subscript𝑥𝛼superscriptsubscript𝐹𝑁1𝛼x_{\alpha}=F_{N}^{-1}(\alpha)italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_α ) and write fN=FNsubscript𝑓𝑁superscriptsubscript𝐹𝑁f_{N}=F_{N}^{\prime}italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT for the density of N𝑁Nitalic_N. By (nist_dlmf, , (7.17(iii))) we get the asymptotic relationship

xαlog(4πα2log(1/(2α)))similar-tosubscript𝑥𝛼4𝜋superscript𝛼212𝛼\displaystyle x_{\alpha}\sim-\sqrt{-\log(4\pi\alpha^{2}\log(1/(2\alpha)))}italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ∼ - square-root start_ARG - roman_log ( 4 italic_π italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( 1 / ( 2 italic_α ) ) ) end_ARG

as α0𝛼0\alpha\to 0italic_α → 0. We will compute CVaRα(N)subscriptCVaR𝛼𝑁\operatorname{CVaR}_{\alpha}(N)roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_N ) and CVaRvα(N)subscriptCVaRv𝛼𝑁\operatorname{CVaRv}_{\alpha}(N)roman_CVaRv start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_N ) via the cumulant generating function ϕitalic-ϕ\phiitalic_ϕ of a truncated Gaussian

ϕ(θ)italic-ϕ𝜃\displaystyle\phi(\theta)italic_ϕ ( italic_θ ) =log𝔼[eθNNxα]absent𝔼delimited-[]conditionalsuperscript𝑒𝜃𝑁𝑁subscript𝑥𝛼\displaystyle=\log\mathbb{E}[e^{\theta N}\mid N\leq x_{\alpha}]= roman_log blackboard_E [ italic_e start_POSTSUPERSCRIPT italic_θ italic_N end_POSTSUPERSCRIPT ∣ italic_N ≤ italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ]
=logFN(xα)+logxαet2/2+θt2π𝑑tabsentsubscript𝐹𝑁subscript𝑥𝛼superscriptsubscriptsubscript𝑥𝛼superscript𝑒superscript𝑡22𝜃𝑡2𝜋differential-d𝑡\displaystyle=-\log F_{N}(x_{\alpha})+\log\int_{-\infty}^{x_{\alpha}}\frac{e^{% -t^{2}/2+\theta t}}{\sqrt{2\pi}}\,dt= - roman_log italic_F start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) + roman_log ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG italic_e start_POSTSUPERSCRIPT - italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 + italic_θ italic_t end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG 2 italic_π end_ARG end_ARG italic_d italic_t
=logFN(xα)+logxαe(tθ)2/2+θ2/22π𝑑tabsentsubscript𝐹𝑁subscript𝑥𝛼superscriptsubscriptsubscript𝑥𝛼superscript𝑒superscript𝑡𝜃22superscript𝜃222𝜋differential-d𝑡\displaystyle=-\log F_{N}(x_{\alpha})+\log\int_{-\infty}^{x_{\alpha}}\frac{e^{% -(t-\theta)^{2}/2+\theta^{2}/2}}{\sqrt{2\pi}}\,dt= - roman_log italic_F start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) + roman_log ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG italic_e start_POSTSUPERSCRIPT - ( italic_t - italic_θ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 + italic_θ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG 2 italic_π end_ARG end_ARG italic_d italic_t
=logFN(xα)+θ2/2+logFN(xαθ).absentsubscript𝐹𝑁subscript𝑥𝛼superscript𝜃22subscript𝐹𝑁subscript𝑥𝛼𝜃\displaystyle=-\log F_{N}(x_{\alpha})+\theta^{2}/2+\log F_{N}(x_{\alpha}-% \theta).= - roman_log italic_F start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) + italic_θ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 + roman_log italic_F start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT - italic_θ ) .

Differentiating at θ=0𝜃0\theta=0italic_θ = 0 yields the expressions

CVaRα(N)subscriptCVaR𝛼𝑁\displaystyle\operatorname{CVaR}_{\alpha}(N)roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_N ) =ϕ(0)=1αfN(xα),absentsuperscriptitalic-ϕ01𝛼subscript𝑓𝑁subscript𝑥𝛼\displaystyle=\phi^{\prime}(0)=-\frac{1}{\alpha}f_{N}(x_{\alpha}),= italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( 0 ) = - divide start_ARG 1 end_ARG start_ARG italic_α end_ARG italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) ,
αCVaRvα(N)𝛼subscriptCVaRv𝛼𝑁\displaystyle\alpha\operatorname{CVaRv}_{\alpha}(N)italic_α roman_CVaRv start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_N ) =ϕ′′(0)=1+1αfN(xα)1α2fN(xα)2absentsuperscriptitalic-ϕ′′011𝛼superscriptsubscript𝑓𝑁subscript𝑥𝛼1superscript𝛼2subscript𝑓𝑁superscriptsubscript𝑥𝛼2\displaystyle=\phi^{\prime\prime}(0)=1+\frac{1}{\alpha}f_{N}^{\prime}(x_{% \alpha})-\frac{1}{\alpha^{2}}f_{N}(x_{\alpha})^{2}= italic_ϕ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( 0 ) = 1 + divide start_ARG 1 end_ARG start_ARG italic_α end_ARG italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) - divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=11αxαfN(xα)1α2fN(xα)2.absent11𝛼subscript𝑥𝛼subscript𝑓𝑁subscript𝑥𝛼1superscript𝛼2subscript𝑓𝑁superscriptsubscript𝑥𝛼2\displaystyle=1-\frac{1}{\alpha}x_{\alpha}f_{N}(x_{\alpha})-\frac{1}{\alpha^{2% }}f_{N}(x_{\alpha})^{2}.= 1 - divide start_ARG 1 end_ARG start_ARG italic_α end_ARG italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) - divide start_ARG 1 end_ARG start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Since α=FN(xα)=Γ(1/2,xα2/2)/(2π)𝛼subscript𝐹𝑁subscript𝑥𝛼Γ12superscriptsubscript𝑥𝛼222𝜋\alpha=F_{N}(x_{\alpha})=\Gamma(1/2,x_{\alpha}^{2}/2)/(2\sqrt{\pi})italic_α = italic_F start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) = roman_Γ ( 1 / 2 , italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ) / ( 2 square-root start_ARG italic_π end_ARG ), it follows that as α0𝛼0\alpha\to 0italic_α → 0,

CVaRα(N)subscriptCVaR𝛼𝑁\displaystyle\operatorname{CVaR}_{\alpha}(N)roman_CVaR start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_N ) =xα(1+1xα22xα2+O(xα6)),absentsubscript𝑥𝛼11superscriptsubscript𝑥𝛼22superscriptsubscript𝑥𝛼2𝑂superscriptsubscript𝑥𝛼6\displaystyle=x_{\alpha}\left(1+\frac{1}{x_{\alpha}^{2}}-\frac{2}{x_{\alpha}^{% 2}}+O(x_{\alpha}^{-6})\right),= italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( 1 + divide start_ARG 1 end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG 2 end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT ) ) ,
CVaRvα(N)subscriptCVaRv𝛼𝑁\displaystyle\operatorname{CVaRv}_{\alpha}(N)roman_CVaRv start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_N ) =1α+xα2α(1+1xα22xα2+O(xα6))absent1𝛼superscriptsubscript𝑥𝛼2𝛼11superscriptsubscript𝑥𝛼22superscriptsubscript𝑥𝛼2𝑂superscriptsubscript𝑥𝛼6\displaystyle=\frac{1}{\alpha}+\frac{x_{\alpha}^{2}}{\alpha}\left(1+\frac{1}{x% _{\alpha}^{2}}-\frac{2}{x_{\alpha}^{2}}+O(x_{\alpha}^{-6})\right)= divide start_ARG 1 end_ARG start_ARG italic_α end_ARG + divide start_ARG italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α end_ARG ( 1 + divide start_ARG 1 end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG 2 end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT ) )
xα2α(1+1xα22xα2+O(xα6))2superscriptsubscript𝑥𝛼2𝛼superscript11superscriptsubscript𝑥𝛼22superscriptsubscript𝑥𝛼2𝑂superscriptsubscript𝑥𝛼62\displaystyle\qquad-\frac{x_{\alpha}^{2}}{\alpha}\left(1+\frac{1}{x_{\alpha}^{% 2}}-\frac{2}{x_{\alpha}^{2}}+O(x_{\alpha}^{-6})\right)^{2}- divide start_ARG italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α end_ARG ( 1 + divide start_ARG 1 end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG 2 end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=1αxα2+O(xα4).absent1𝛼superscriptsubscript𝑥𝛼2𝑂superscriptsubscript𝑥𝛼4\displaystyle=\frac{1}{\alpha x_{\alpha}^{2}}+O(x_{\alpha}^{-4}).= divide start_ARG 1 end_ARG start_ARG italic_α italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_O ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT ) .

As a final example, we can consider the case where X𝑋Xitalic_X has density fX(x)=βx1β1(x1)subscript𝑓𝑋𝑥𝛽superscript𝑥1𝛽1𝑥1f_{X}(x)=\beta x^{-1-\beta}1(x\geq 1)italic_f start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) = italic_β italic_x start_POSTSUPERSCRIPT - 1 - italic_β end_POSTSUPERSCRIPT 1 ( italic_x ≥ 1 ) where β>0𝛽0\beta>0italic_β > 0 (i.e., we consider a power law tail). Here, one can compute that for β>2𝛽2\beta>2italic_β > 2, CVaRv¯α(X)=β(β1)2(β2)1α2/βsubscript¯CVaRv𝛼𝑋𝛽superscript𝛽12superscript𝛽21superscript𝛼2𝛽\overline{\operatorname{CVaRv}}_{\alpha}(X)=\beta(\beta-1)^{-2}(\beta-2)^{-1}% \alpha^{-2/\beta}over¯ start_ARG roman_CVaRv end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_X ) = italic_β ( italic_β - 1 ) start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ( italic_β - 2 ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT - 2 / italic_β end_POSTSUPERSCRIPT which is worse than the decay in the standard normal case and achieves the worst case upper bound on the variance in the β2𝛽2\beta\to 2italic_β → 2 limit.

Appendix D Relation to brute-force search

A brute-force search enumerates all 2nsuperscript2𝑛2^{n}2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT candidate solutions and checks which one is optimal. The sampling overhead of γ𝛾\sqrt{\gamma}square-root start_ARG italic_γ end_ARG on noisy devices can thus be related to brute-force search thereby allowing us to derive a hardware requirements for QAOA. Assuming, for simplicity, that the probability pxsubscript𝑝𝑥p_{x}italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT to sample the optimal solution is close to 1111 we require hardware with γ<2n𝛾superscript2𝑛\sqrt{\gamma}<2^{n}square-root start_ARG italic_γ end_ARG < 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. We can relate this to the layer fidelity to obtain a requirement on hardware quality necessary for potential quantum advantage. First, we assume that each layer i𝑖iitalic_i in a QAOA circuit has the same layer fidelity LF=1/γiLF1subscript𝛾𝑖{\rm LF}=1/\sqrt{\gamma_{i}}roman_LF = 1 / square-root start_ARG italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG. As a result the γ𝛾\gammaitalic_γ of the circuit is γ=i=1d(n)γi=1/LF2d(n)𝛾superscriptsubscriptproduct𝑖1𝑑𝑛subscript𝛾𝑖1superscriptLF2𝑑𝑛\gamma=\prod_{i=1}^{d(n)}\gamma_{i}=1/{\rm LF}^{2d(n)}italic_γ = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d ( italic_n ) end_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 / roman_LF start_POSTSUPERSCRIPT 2 italic_d ( italic_n ) end_POSTSUPERSCRIPT where d(n)𝑑𝑛d(n)italic_d ( italic_n ) is the depth defined as the number of non-overlap** two-qubit gate layers. This assumption is reasonable when transpiling QAOA circuits to a line of qubits which requires layers of CNOT gates applied on every other edge weidenfeller2022scaling . Therefore, the sampling cost to compensate for noise is 1/LFd(n)1superscriptLF𝑑𝑛1/{\rm LF}^{d(n)}1 / roman_LF start_POSTSUPERSCRIPT italic_d ( italic_n ) end_POSTSUPERSCRIPT. For a line of qubits we may assume that to leading order d(n)3npsimilar-to𝑑𝑛3𝑛𝑝d(n)\sim 3npitalic_d ( italic_n ) ∼ 3 italic_n italic_p. The factor 3n3𝑛3n3 italic_n comes from the fact that n2𝑛2n-2italic_n - 2 layers of SWAP gates are needed to implement full connectivity and each SWAP merged with an RZZsubscript𝑅𝑍𝑍R_{ZZ}italic_R start_POSTSUBSCRIPT italic_Z italic_Z end_POSTSUBSCRIPT is implemented with three CNOT gates. Here, p𝑝pitalic_p is the number of QAOA layers which is sometimes assumed to grow with the logarithm of problem size, i.e., plog(n)proportional-to𝑝𝑛p\propto\log(n)italic_p ∝ roman_log ( italic_n ) Bravyi2019 ; weidenfeller2022scaling . If the sampling overhead should stay below brute-force search we therefore require LF3np<2nsuperscriptLF3𝑛𝑝superscript2𝑛{\rm LF}^{-3np}<2^{n}roman_LF start_POSTSUPERSCRIPT - 3 italic_n italic_p end_POSTSUPERSCRIPT < 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT which implies that the layer fidelity must satisfy

LF>121/3p.LF1superscript213𝑝\displaystyle{\rm LF}>\frac{1}{2^{1/3p}}.roman_LF > divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT 1 / 3 italic_p end_POSTSUPERSCRIPT end_ARG . (33)

This requirement is only dependent on problem size through the relation between p𝑝pitalic_p and n𝑛nitalic_n. However, as shown in Ref. mckay2023benchmarking the layer fidelity decreases with the number of qubits in the layer. If we further assume that layers are dense, i.e., every layer on n𝑛nitalic_n qubits consists of approximately n/2𝑛2n/2italic_n / 2 CNOT gates, we can compute a corresponding CNOT fidelity as LF2/nsuperscriptLF2𝑛{\rm LF}^{2/n}roman_LF start_POSTSUPERSCRIPT 2 / italic_n end_POSTSUPERSCRIPT, as well as the corresponding lower bound

LF2/n>122/3pn.superscriptLF2𝑛1superscript223𝑝𝑛\displaystyle{\rm LF}^{2/n}>\frac{1}{2^{2/3pn}}.roman_LF start_POSTSUPERSCRIPT 2 / italic_n end_POSTSUPERSCRIPT > divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT 2 / 3 italic_p italic_n end_POSTSUPERSCRIPT end_ARG . (34)

Appendix E 40-qubit Circuits

The 40-qubit circuits in the main texts are based on those in Ref. Sack2023 . In this work, the authors consider random three-regular graphs transpiled to a line of qubits using a swap network weidenfeller2022scaling . This results in circuits that alternate only two types of layers of CNOT gates as described in the main text. Furthermore, the authors carefully chose the decision variable to physical qubit map** to minimize the number of layers of the swap network. This method is described in Ref. Matsuo2023 . The code to produce such circuits is available on GitHub BestPractices . The optimal parameters resulting from the light-cone optimization are given by (γ1,β1)=(2.8405,0.3982)subscript𝛾1subscript𝛽12.84050.3982(\gamma_{1},\beta_{1})=(2.8405,0.3982)( italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = ( 2.8405 , 0.3982 ) for p=1𝑝1p=1italic_p = 1 and (γ1,β1,γ2,β2)=(1.1506,0.3288,0.1941,0.6582)subscript𝛾1subscript𝛽1subscript𝛾2subscript𝛽21.15060.32880.19410.6582(\gamma_{1},\beta_{1},\gamma_{2},\beta_{2})=(1.1506,0.3288,0.1941,0.6582)( italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ( 1.1506 , 0.3288 , 0.1941 , 0.6582 ) for p=2𝑝2p=2italic_p = 2, respectively.

Appendix F Dynamical Decoupling

Dynamical decoupling (DD) removes an interaction between a system and a bath by inserting pulses Viola1998 ; Zanardi1999 ; Vitali1999 . Here, we briefly summarize DD following Ref. Ezzell2023 . Consider a time-independent bath HBsubscript𝐻𝐵H_{B}italic_H start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT interacting with the system HS=HS0+HS1subscript𝐻𝑆superscriptsubscript𝐻𝑆0superscriptsubscript𝐻𝑆1H_{S}=H_{S}^{0}+H_{S}^{1}italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT + italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT though HSBsubscript𝐻𝑆𝐵H_{SB}italic_H start_POSTSUBSCRIPT italic_S italic_B end_POSTSUBSCRIPT. Here, HS1superscriptsubscript𝐻𝑆1H_{S}^{1}italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT is an undesired, always-on error term. The goal of DD is to insert pulses in idle times such the time evolution of the system and bath becomes U(T)=U0(T)B(T)𝑈𝑇subscript𝑈0𝑇𝐵𝑇U(T)=U_{0}(T)B(T)italic_U ( italic_T ) = italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_T ) italic_B ( italic_T ) with U0=exp(iTHS0)𝕀Bsubscript𝑈0tensor-product𝑖𝑇superscriptsubscript𝐻𝑆0subscript𝕀𝐵U_{0}=\exp(-iTH_{S}^{0})\otimes\mathbb{I}_{B}italic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_exp ( - italic_i italic_T italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ) ⊗ blackboard_I start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT the desired error-free time-evolution and B(T)𝐵𝑇B(T)italic_B ( italic_T ) ideally acts on the bath alone.

Consider a single qubit with HB+HSB=α=03γασαBαsubscript𝐻𝐵subscript𝐻𝑆𝐵superscriptsubscript𝛼03tensor-productsubscript𝛾𝛼superscript𝜎𝛼superscript𝐵𝛼H_{B}+H_{SB}=\sum_{\alpha=0}^{3}\gamma_{\alpha}\sigma^{\alpha}\otimes B^{\alpha}italic_H start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT + italic_H start_POSTSUBSCRIPT italic_S italic_B end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_α = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT ⊗ italic_B start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT. Here, γαsubscript𝛾𝛼\gamma_{\alpha}italic_γ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT is a coefficient, and Bαsuperscript𝐵𝛼B^{\alpha}italic_B start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT is the bath term that couples to the qubit through the σαsuperscript𝜎𝛼\sigma^{\alpha}italic_σ start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT Pauli matrix. The simplest DD sequence is PX=XdτXdτPX𝑋subscript𝑑𝜏𝑋subscript𝑑𝜏{\rm PX}=X-d_{\tau}-X-d_{\tau}roman_PX = italic_X - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_X - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT where dτsubscript𝑑𝜏d_{\tau}italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT indicates a delay of duration τ𝜏\tauitalic_τ. Since X𝑋Xitalic_X anti-commutes with Y𝑌Yitalic_Y and Z𝑍Zitalic_Z, the sequence PXPX{\rm PX}roman_PX cancels the YBYtensor-product𝑌superscript𝐵𝑌Y\otimes B^{Y}italic_Y ⊗ italic_B start_POSTSUPERSCRIPT italic_Y end_POSTSUPERSCRIPT and ZBZtensor-product𝑍superscript𝐵𝑍Z\otimes B^{Z}italic_Z ⊗ italic_B start_POSTSUPERSCRIPT italic_Z end_POSTSUPERSCRIPT system-bath interactions. The effective error Hamiltonian after a duration 2τ2𝜏2\tau2 italic_τ is HPXerr=γxXBx+𝕀sB~+𝒪(τ2)subscriptsuperscript𝐻errPXtensor-productsubscript𝛾𝑥𝑋superscript𝐵𝑥tensor-productsubscript𝕀𝑠~𝐵𝒪superscript𝜏2H^{\text{err}}_{\rm PX}=\gamma_{x}X\otimes B^{x}+\mathbb{I}_{s}\otimes\tilde{B% }+\mathcal{O}(\tau^{2})italic_H start_POSTSUPERSCRIPT err end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_PX end_POSTSUBSCRIPT = italic_γ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_X ⊗ italic_B start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + blackboard_I start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ⊗ over~ start_ARG italic_B end_ARG + caligraphic_O ( italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Here, we see that PXPX\rm PXroman_PX is not universal since an X𝑋Xitalic_X error remains. Universal decoupling up to first-order is achieved with the XY4𝑋𝑌4XY4italic_X italic_Y 4 sequence

XY4=YdτXdτYdτXdτXY4𝑌subscript𝑑𝜏𝑋subscript𝑑𝜏𝑌subscript𝑑𝜏𝑋subscript𝑑𝜏\displaystyle{\rm XY4}=Y-d_{\tau}-X-d_{\tau}-Y-d_{\tau}-X-d_{\tau}XY4 = italic_Y - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_X - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_Y - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_X - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT (35)

which results in the effective error Hamiltonian HXY4err=𝕀sB~+𝒪(τ2)subscriptsuperscript𝐻errXY4tensor-productsubscript𝕀𝑠~𝐵𝒪superscript𝜏2H^{\text{err}}_{\rm XY4}=\mathbb{I}_{s}\otimes\tilde{B}+\mathcal{O}(\tau^{2})italic_H start_POSTSUPERSCRIPT err end_POSTSUPERSCRIPT start_POSTSUBSCRIPT XY4 end_POSTSUBSCRIPT = blackboard_I start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ⊗ over~ start_ARG italic_B end_ARG + caligraphic_O ( italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

We now consider the two-qubit case. Two fixed-frequency qubits typically exhibit an undesired ZZ𝑍𝑍ZZitalic_Z italic_Z-coupling which is effectively suppressed with DD Tripathi2022 ; Mundada2023 . Simultaneously applying the PXPX\rm PXroman_PX sequence on both qubits cancels unwanted errors arising from 𝕀Ztensor-product𝕀𝑍\mathbb{I}\otimes Zblackboard_I ⊗ italic_Z and Z𝕀tensor-product𝑍𝕀Z\otimes\mathbb{I}italic_Z ⊗ blackboard_I. However, since simultaneous X𝑋Xitalic_X pulses commute with ZZtensor-product𝑍𝑍Z\otimes Zitalic_Z ⊗ italic_Z the unwanted ZZ𝑍𝑍ZZitalic_Z italic_Z interactions (i.e. cross-talk, which is common in transmon qubits) are still present. This is remedied with staggered DD. We apply the sequence

X1dτX0dτX1dτX0dτsubscript𝑋1subscript𝑑𝜏subscript𝑋0subscript𝑑𝜏subscript𝑋1subscript𝑑𝜏subscript𝑋0subscript𝑑𝜏\displaystyle X_{1}-d_{\tau}-X_{0}-d_{\tau}-X_{1}-d_{\tau}-X_{0}-d_{\tau}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT (36)

which staggers two PXPX\rm PXroman_PX sequences. Here, Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is an X𝑋Xitalic_X gate applied to qubit i𝑖iitalic_i, which inverts the evolution of Zisubscript𝑍𝑖Z_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Z1Z0tensor-productsubscript𝑍1subscript𝑍0Z_{1}\otimes Z_{0}italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. In total, the evolution of single-qubit Zisubscript𝑍𝑖Z_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT errors changes sign twice and the evolution of ZZ𝑍𝑍ZZitalic_Z italic_Z errors changes sign four times. In this work we apply the staggered XY4 sequence zhou_quantum_2022 (a variant of the staggered XX sequence presented in Mundada2023 ) to ensure a proper cancellation of two-qubit static cross-talk. The staggered XY4 sequence we employ is defined by Y0dτY1dτX0dτX1dτY0dτY1dτX0dτX1dτsubscript𝑌0subscript𝑑𝜏subscript𝑌1subscript𝑑𝜏subscript𝑋0subscript𝑑𝜏subscript𝑋1subscript𝑑𝜏subscript𝑌0subscript𝑑𝜏subscript𝑌1subscript𝑑𝜏subscript𝑋0subscript𝑑𝜏subscript𝑋1subscript𝑑𝜏Y_{0}-d_{\tau}-Y_{1}-d_{\tau}-X_{0}-d_{\tau}-X_{1}-d_{\tau}-Y_{0}-d_{\tau}-Y_{% 1}-d_{\tau}-X_{0}-d_{\tau}-X_{1}-d_{\tau}italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT. As discussed above, it is universal for single-qubit terms and will cancel the static ZZ𝑍𝑍ZZitalic_Z italic_Z cross-talk between qubits.

Appendix G 127-qubit QAOA Circuits

Refer to caption
Figure 6: From Refs. pelofske2023qavsqaoa ; pelofske2023short : Diagram of a heavy-hex graph compatible p=1𝑝1p=1italic_p = 1 QAOA circuit for sampling heavy-hex compatible higher order Ising models (specifically cubic terms centered on degree 2 nodes). Left hand side of the figure shows the 3-edge-coloring and the bipartition of the graph, and the cubic terms are denoted by the adjacent purple lines next to the hardware graph. The right hand side of the figure shows the corresponding QAOA circuit for this sub-component of the heavy-hex graph (which can be extended arbitrarily to a large heavy-hex graph, and to higher p𝑝pitalic_p). The cubic terms are addressed by a single layer of Rx rotation gates (shown in purple), and the total CNOT depth per p𝑝pitalic_p is always 6666. Following the phase separator, the transverse field mixer is applied and then the state of all qubits are measured after p𝑝pitalic_p rounds have been applied.

In Figure 6, taken from Refs. pelofske2023qavsqaoa ; pelofske2023short , we briefly discuss the optimized circuits for the 127-qubit higher-order instances to have a self-contained description. This illustrates that all 2-qubit gates needed for the implementation of eiγHsuperscript𝑒𝑖𝛾𝐻e^{-i\gamma H}italic_e start_POSTSUPERSCRIPT - italic_i italic_γ italic_H end_POSTSUPERSCRIPT can be scheduled in just 3 different layers of non-overlap** CNOT gates. In each QAOA round p𝑝pitalic_p, each layer is used once to compute and once to uncompute ZZ𝑍𝑍ZZitalic_Z italic_Z and ZZZ𝑍𝑍𝑍ZZZitalic_Z italic_Z italic_Z parity values, for an overall CNOT depth of 6p6𝑝6p6 italic_p. The exact values of the heuristically computed, using parameter transfer, QAOA angles that give a strictly increasing expectation value as p𝑝pitalic_p increases up to 5555 are given in Ref. heavy_hex_QAOA_parameter_transfer2023 .

References

  • (1) A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien. A variational eigenvalue solver on a photonic quantum processor. Nature Communications, 5(1), 2014. DOI: 10.1038/ncomms5213.
  • (2) P. J. Ollitrault, A. Miessen, and I. Tavernelli. Molecular Quantum Dynamics: A Quantum Computing Perspective. Accounts of Chemical Research, 54(23):4229–4238, 2021. DOI: 10.1021/acs.accounts.1c00514. PMID: 34787398.
  • (3) A. D. Meglio, K. Jansen, I. Tavernelli, C. Alexandrou, S. Arunachalam, C. W. Bauer, K. Borras, S. Carrazza, A. Crippa, V. Croft, R. de Putter, A. Delgado, V. Dunjko, D. J. Egger, E. Fernandez-Combarro, E. Fuchs, L. Funcke, D. Gonzalez-Cuadra, M. Grossi, J. C. Halimeh, Z. Holmes, S. Kuhn, D. Lacroix, R. Lewis, D. Lucchesi, M. L. Martinez, F. Meloni, A. Mezzacapo, S. Montangero, L. Nagano, V. Radescu, E. R. Ortega, A. Roggero, J. Schuhmacher, J. Seixas, P. Silvi, P. Spentzouris, F. Tacchino, K. Temme, K. Terashi, J. Tura, C. Tuysuz, S. Vallecorsa, U.-J. Wiese, S. Yoo, and J. Zhang. Quantum Computing for High-Energy Physics: State of the Art and Challenges. Summary of the QC4HEP Working Group, 2023. DOI: 10.48550/arXiv.2307.03236.
  • (4) P. K. Barkoutsos, F. Gkritsis, P. J. Ollitrault, I. O. Sokolov, S. Woerner, and I. Tavernelli. Quantum algorithm for alchemical optimization in material design. Chemical Science, 12(12):4345–4352, 2021. DOI: 10.1039/D0SC05718E.
  • (5) V. Havlicek, A. D. Corcoles, K. Temme, A. W. Harrow, A. Kandala, J. M. Chow, and J. M. Gambetta. Supervised learning with quantum-enhanced feature spaces. Nature, 567:209 – 212, 2019. DOI: 10.1038/s41586-019-0980-2.
  • (6) C. Zoufal, A. Lucchi, and S. Woerner. Quantum Generative Adversarial Networks for learning and loading random distributions. npj Quantum Information, 5(1), 2019. DOI: 10.1038/s41534-019-0223-2.
  • (7) C. Zoufal, A. Lucchi, and S. Woerner. Variational quantum Boltzmann machines. Quantum Machine Intelligence, 3(1), 2021. DOI: 10.1007/s42484-020-00033-7.
  • (8) E. Farhi, J. Goldstone, and S. Gutmann. A Quantum Approximate Optimization Algorithm, 2014. DOI: 10.48550/arXiv.1411.4028.
  • (9) S. Bravyi, A. Kliesch, R. Koenig, and E. Tang. Obstacles to Variational Quantum Optimization from Symmetry Protection. Physical Review Letters, 125(26):260505, 2020. DOI: 10.1103/PhysRevLett.125.260505.
  • (10) D. J. Egger, J. Mareček, and S. Woerner. Warm-starting quantum optimization. Quantum, 5:479, 2021. DOI: 10.22331/q-2021-06-17-479.
  • (11) S. H. Sack and D. J. Egger. Large-scale quantum approximate optimization on non-planar graphs with machine learning noise mitigation, 2023. DOI: 10.48550/arXiv.2307.14427.
  • (12) S. Woerner and D. J. Egger. Quantum risk analysis. npj Quantum Information, 5(1), 2019. DOI: 10.1038/s41534-019-0130-6.
  • (13) E. Yndurain, S. Woerner, and D. Egger. Exploring quantum computing use cases for financial services, 2019. Available online: https://www.ibm.com/downloads/cas/2YPRZPB3.[dl:21.11.2023].
  • (14) N. Stamatopoulos, G. Mazzola, S. Woerner, and W. J. Zeng. Towards quantum advantage in financial market risk using quantum gradient algorithms. Quantum, 6:770, 2022. DOI: 10.22331/q-2022-07-20-770.
  • (15) M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information: 10th Anniversary Edition. Cambridge University Press, 2011.
  • (16) D. A. Lidar and T. A. Brun. Quantum Error Correction. Cambridge University Press, 2013. DOI: 10.1017/CBO9781139034807.
  • (17) E. van den Berg, Z. K. Minev, A. Kandala, and K. Temme. Probabilistic error cancellation with sparse Pauli-Lindblad models on noisy quantum processors. Nature Physics, 19:1116–1121, 2023. DOI: 10.1038/s41567-023-02042-2.
  • (18) C. Piveteau, D. Sutter, and S. Woerner. Quasiprobability decompositions with reduced sampling overhead. npj Quantum Information, 8(1), 2022. DOI: 10.1038/s41534-022-00517-3.
  • (19) K. Temme, S. Bravyi, and J. M. Gambetta. Error Mitigation for Short-Depth Quantum Circuits. Physical Review Letters, 119(18), 2017. DOI: 10.1103/physrevlett.119.180509.
  • (20) Y. Quek, D. S. França, S. Khatri, J. J. Meyer, and J. Eisert. Exponentially tighter bounds on limitations of quantum error mitigation, 2023. DOI: 10.48550/arXiv.2210.11505.
  • (21) Y. Kim, A. Eddins, S. Anand, K. X. Wei, E. van den Berg, S. Rosenblatt, H. Nayfeh, Y. Wu, M. Zaletel, K. Temme, and A. Kandala. Evidence for the utility of quantum computing before fault tolerance. Nature, 618:500–505, 2023. DOI: 10.1038/s41586-023-06096-3.
  • (22) S. Anand, K. Temme, A. Kandala, and M. Zaletel. Classical benchmarking of zero noise extrapolation beyond the exactly-verifiable regime, 2023. DOI: 10.48550/arXiv.2306.17839.
  • (23) S. Bravyi, O. Dial, J. M. Gambetta, D. Gil, and Z. Nazario. The future of quantum computing with superconducting qubits. Journal of Applied Physics, 132(16), 2022. DOI: 10.1063/5.0082975.
  • (24) C. Zoufal, R. V. Mishmash, N. Sharma, N. Kumar, A. Sheshadri, A. Deshmukh, N. Ibrahim, J. Gacon, and S. Woerner. Variational quantum algorithm for unconstrained black box binary optimization: Application to feature selection. Quantum, 7:909, 2023. DOI: 10.22331/q-2023-01-26-909.
  • (25) A. Letcher, S. Woerner, and C. Zoufal. From Tight Gradient Bounds for Parameterized Quantum Circuits to the Absence of Barren Plateaus in QGANs, 2023. DOI: 10.48550/arXiv.2309.12681.
  • (26) P. K. Barkoutsos, G. Nannicini, A. Robert, I. Tavernelli, and S. Woerner. Improving variational quantum optimization using CVaR. Quantum, 4:256, 2020. DOI: 10.22331/q-2020-04-20-256.
  • (27) J. Wurtz and P. Love. MaxCut quantum approximate optimization algorithm performance guarantees for p>1𝑝1p>1italic_p > 1. Physical Review A, 103(4):042612, 2021. DOI: 10.1103/PhysRevA.103.042612.
  • (28) E. Knill, D. Leibfried, R. Reichle, J. Britton, R. B. Blakestad, J. D. Jost, C. Langer, R. Ozeri, S. Seidelin, and D. J. Wineland. Randomized benchmarking of quantum gates. Physical Review A, 77(1), 2008. DOI: 10.1103/PhysRevA.77.012307. Publisher: American Physical Society.
  • (29) C. Dankert, R. Cleve, J. Emerson, and E. Livine. Exact and approximate unitary 2-designs and their application to fidelity estimation. Physical Review A, 80(1):012304, 2009. DOI: 10.1103/PhysRevA.80.012304. Publisher: American Physical Society.
  • (30) E. Magesan, J. M. Gambetta, and J. Emerson. Scalable and Robust Randomized Benchmarking of Quantum Processes. Physical Review Letters, 106(18):180504, 2011. DOI: 10.1103/PhysRevLett.106.180504. Publisher: American Physical Society.
  • (31) S. Kokosaka and Z. D. CRC Standard Probability and Statistics Tables and Formulae. CRC Press, 2000. DOI: 10.1201/b16923.
  • (32) D. C. McKay, I. Hincks, E. J. Pritchett, M. Carroll, L. C. G. Govia, and S. T. Merkel. Benchmarking Quantum Processor Performance at Scale, 2023. DOI: 10.48550/arXiv.2311.05933.
  • (33) E. van den Berg, Z. K. Minev, and K. Temme. Model-free readout-error mitigation for quantum expectation values. Physical Review A, 105(3), 2022. DOI: 10.1103/physreva.105.032620.
  • (34) P. D. Nation, H. Kang, N. Sundaresan, and J. M. Gambetta. Scalable Mitigation of Measurement Errors on Quantum Computers. PRX Quantum, 2(4), 2021. DOI: 10.1103/prxquantum.2.040326.
  • (35) X. Bonet-Monroig, R. Sagastizabal, M. Singh, and T. E. O'Brien. Low-cost error mitigation by symmetry verification. Physical Review A, 98(6), 2018. DOI: 10.1103/physreva.98.062339.
  • (36) A. Choquette, A. Di Paolo, P. K. Barkoutsos, D. Sénéchal, I. Tavernelli, and A. Blais. Quantum-optimal-control-inspired ansatz for variational quantum algorithms. Physical Review Research, 3(2), 2021. DOI: 10.1103/physrevresearch.3.023092.
  • (37) J. Weidenfeller, L. C. Valor, J. Gacon, C. Tornow, L. Bello, S. Woerner, and D. J. Egger. Scaling of the quantum approximate optimization algorithm on superconducting qubit based hardware. Quantum, 6:870, 2022. DOI: 10.22331/q-2022-12-07-870.
  • (38) G. Gentinetta, A. Thomsen, D. Sutter, and S. Woerner. The complexity of quantum support vector machines, 2022. DOI: 10.48550/arXiv.2203.00031.
  • (39) G. Gentinetta, D. Sutter, C. Zoufal, B. Fuller, and S. Woerner. Quantum Kernel Alignment with Stochastic Gradient Descent, 2023. DOI: 10.48550/arXiv.2304.09899.
  • (40) S. McArdle, T. Jones, S. Endo, Y. Li, S. C. Benjamin, and X. Yuan. Variational ansatz-based quantum simulation of imaginary time evolution. npj Quantum Information, 5(1), 2019. DOI: 10.1038/s41534-019-0187-2.
  • (41) X. Yuan, S. Endo, Q. Zhao, Y. Li, and S. C. Benjamin. Theory of variational quantum simulation. Quantum, 3:191, 2019. DOI: 10.22331/q-2019-10-07-191.
  • (42) C. Zoufal, D. Sutter, and S. Woerner. Error bounds for variational quantum time evolution. Physical Review Applied, 20(4), 2023. DOI: 10.1103/physrevapplied.20.044059.
  • (43) J. Gacon, C. Zoufal, G. Carleo, and S. Woerner. Simultaneous Perturbation Stochastic Approximation of the Quantum Fisher Information. Quantum, 5:567, 2021. DOI: 10.22331/q-2021-10-20-567.
  • (44) J. Gacon, C. Zoufal, G. Carleo, and S. Woerner. Stochastic Approximation of Variational Quantum Imaginary Time Evolution, 2023. DOI: 10.48550/arXiv.2305.07059.
  • (45) J. Gacon, J. Nys, R. Rossi, S. Woerner, and G. Carleo. Variational quantum time evolution without the quantum geometric tensor, 2023. DOI: 10.48550/arXiv.2303.12839.
  • (46) B. Fuller, C. Hadfield, J. R. Glick, T. Imamichi, T. Itoko, R. J. Thompson, Y. Jiao, M. M. Kagele, A. W. Blom-Schieber, R. Raymond, and A. Mezzacapo. Approximate Solutions of Combinatorial Problems via Quantum Relaxations, 2021. DOI: 10.48550/arXiv.2111.03167.
  • (47) K. Teramoto, R. Raymond, E. Wakakuwa, and H. Imai. Quantum-Relaxation Based Optimization Algorithms: Theoretical Extensions, 2023. DOI: 10.48550/arXiv.2302.09481.
  • (48) T. L. Patti, J. Kossaifi, A. Anandkumar, and S. F. Yelin. Variational quantum optimization with multibasis encodings. Physical Review Research, 4(3):033142, 2022. DOI: 10.1103/PhysRevResearch.4.033142.
  • (49) A. Lucas. Ising formulations of many NP problems. Frontiers in Physics, 2, 2014. DOI: 10.3389/fphy.2014.00005.
  • (50) M. Streif and M. Leib. Training the quantum approximate optimization algorithm without access to a quantum processing unit. Quantum Science and Technology, 5(3):034008, 2020. DOI: 10.1088/2058-9565/ab8c2b.
  • (51) S. H. Sack and M. Serbyn. Quantum annealing initialization of the quantum approximate optimization algorithm. Quantum, 5:491, 2021. DOI: 10.22331/q-2021-07-01-491.
  • (52) T. Begušić, K. Hejazi, and G. K.-L. Chan. Simulating quantum circuit expectation values by clifford perturbation theory, 2023. DOI: 10.48550/arXiv.2306.04797.
  • (53) S. Hadfield, Z. Wang, B. O’Gorman, E. G. Rieffel, D. Venturelli, and R. Biswas. From the Quantum Approximate Optimization Algorithm to a Quantum Alternating Operator Ansatz. Algorithms, 12(2):34, 2019. DOI: 10.3390/a12020034.
  • (54) Z. Wang, N. C. Rubin, J. M. Dominy, and E. G. Rieffel. XY𝑋𝑌XYitalic_X italic_Y mixers: Analytical and numerical results for the quantum alternating operator ansatz. Physical Review A, 101(1):012320, 2020. DOI: 10.1103/PhysRevA.101.012320.
  • (55) J. Cook, S. Eidenbenz, and A. Bärtschi. The Quantum Alternating Operator Ansatz on Maximum k-Vertex Cover. In IEEE International Conference on Quantum Computing & Engineering QCE’20, pages 83–92, 2020. DOI: 10.1109/QCE49297.2020.00021.
  • (56) J. Golden, A. Bärtschi, S. Eidenbenz, and D. O’Malley. Numerical Evidence for Exponential Speed-up of QAOA over Unstructured Search for Approximate Constrained Optimization. In IEEE International Conference on Quantum Computing & Engineering QCE’23, pages 496–505, 2023. DOI: 10.1109/QCE57702.2023.00063.
  • (57) A. Bärtschi and S. Eidenbenz. Short-Depth Circuits for Dicke State Preparation. In IEEE International Conference on Quantum Computing & Engineering QCE’22, pages 87–96, 2022. DOI: 10.1109/QCE53715.2022.00027.
  • (58) A. Bärtschi and S. Eidenbenz. Grover Mixers for QAOA: Shifting Complexity from Mixer Design to State Preparation. In IEEE International Conference on Quantum Computing & Engineering QCE’20, pages 72–82, 2020. DOI: 10.1109/QCE49297.2020.00020.
  • (59) Y. Liu, S. Arunachalam, and K. Temme. A rigorous and robust quantum speed-up in supervised machine learning. Nature Physics, 17(9):1013–1017, 2021. DOI: 10.1038/s41567-021-01287-z.
  • (60) IBM Quantum. IBM Quantum Platform - Compute resources. https://quantum-computing.ibm.com/services/resources, 2023. [Online; accessed 20-Nov-2023].
  • (61) S. Sheldon, E. Magesan, J. M. Chow, and J. M. Gambetta. Procedure for systematically tuning up cross-talk in the cross-resonance gate. Physical Review A, 93(6):060302, 2016. DOI: 10.1103/PhysRevA.93.060302.
  • (62) At the time of writing the experiment to measure layer fidelity is under implementation in Qiskit Experiments QiskitExperiments . See https://github.com/Qiskit-Extensions/qiskit-experiments.
  • (63) E. Pelofske, A. Bärtschi, and S. Eidenbenz. Quantum Annealing vs. QAOA: 127 Qubit Higher-Order Ising Problems on NISQ Computers. In International Conference on High Performance Computing ISC HPC’23, pages 240–258, 2023. DOI: 10.1007/978-3-031-32041-5_13.
  • (64) E. Pelofske, A. Bärtschi, and S. Eidenbenz. Short-Depth QAOA circuits and Quantum Annealing on Higher-Order Ising Models. npj Quantum Information, 2023. DOI: 10.2172/1985256. Accepted.
  • (65) C. Chamberland, G. Zhu, T. J. Yoder, J. B. Hertzberg, and A. W. Cross. Topological and subsystem codes on low-degree graphs with flag qubits. Physical Review X, 10(1), 2020. DOI: 10.1103/physrevx.10.011022.
  • (66) E. Pelofske, A. Bärtschi, L. Cincio, J. Golden, and S. Eidenbenz. Scaling Whole-Chip QAOA for Higher-Order Ising Spin Glass Models on Heavy-Hex Graphs, 2023. LANL report LA-UR-23-33192; to appear.
  • (67) IBM ILOG CPLEX. V12.10.0: User’s Manual for CPLEX. International Business Machines Corporation, 46(53):157, 2009.
  • (68) J. J. Wallman. Bounding experimental quantum error rates relative to fault-tolerant thresholds, 2016. DOI: 10.48550/arXiv.1511.00727.
  • (69) J. J. Wallman and J. Emerson. Noise tailoring for scalable quantum computation via randomized compiling. Physical Review A, 94(5), 2016. DOI: 10.1103/physreva.94.052325.
  • (70) J. Dedecker and F. Merlevède. Central limit theorem and almost sure results for the empirical estimator of superquantiles/CVaR in the stationary case. Statistics, 56(1):53–72, 2022. DOI: 10.1080/02331888.2022.2043325.
  • (71) NIST Digital Library of Mathematical Functions. https://dlmf.nist.gov/, Release 1.1.11 of 2023-09-15. Available online: https://dlmf.nist.gov/. F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, B. V. Saunders, H. S. Cohl, and M. A. McClain, eds.
  • (72) A. Matsuo, S. Yamashita, and D. J. Egger. A SAT Approach to the Initial Map** Problem in SWAP Gate Insertion for Commuting Gates. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E106.A(11):1424–1431, 2023. DOI: 10.1587/transfun.2022eap1159.
  • (73) Best practices in quantum optimization. Available online: https://github.com/qiskit-community/qopt-best-practices.
  • (74) L. Viola and S. Lloyd. Dynamical suppression of decoherence in two-state quantum systems. Phyical Review A, 58(4):2733, 1998. DOI: 10.1103/PhysRevA.58.2733.
  • (75) P. Zanardi. Symmetrizing evolutions. Physics Letters A, 258(2–3):77–82, 1999. DOI: 10.1016/S0375-9601(99)00365-5.
  • (76) D. Vitali and P. Tombesi. Using parity kicks for decoherence control. Phyical Review A, 59(6):4178, 1999. DOI: 10.1103/PhysRevA.59.4178.
  • (77) N. Ezzell, B. Pokharel, L. Tewala, G. Quiroz, and D. A. Lidar. Dynamical decoupling for superconducting qubits: a performance survey, 2023. DOI: 10.48550/arXiv.2207.03670.
  • (78) V. Tripathi, H. Chen, M. Khezri, K.-W. Yip, E. Levenson-Falk, and D. A. Lidar. Suppression of Crosstalk in Superconducting Qubits Using Dynamical Decoupling. Physical Review Appl., 18(2):024068, 2022. DOI: 10.1103/PhysRevApplied.18.024068.
  • (79) P. S. Mundada, A. Barbosa, S. Maity, Y. Wang, T. Merkh, T. Stace, F. Nielson, A. R. Carvalho, M. Hush, M. J. Biercuk, and Y. Baum. Experimental Benchmarking of an Automated Deterministic Error-Suppression Workflow for Quantum Algorithms. Physical Review Applied, 20(2):024034, 2023. DOI: 10.1103/PhysRevApplied.20.024034.
  • (80) Z. Zhou, R. Sitler, Y. Oda, K. Schultz, and G. Quiroz. Quantum Crosstalk Robust Quantum Control, 2023. DOI: 10.1103/PhysRevLett.131.210802.
  • (81) N. Kanazawa, D. J. Egger, Y. Ben-Haim, H. Zhang, W. E. Shanks, G. Aleksandrowicz, and C. J. Wood. Qiskit experiments: A python package to characterize and calibrate quantum computers. Journal of Open Source Software, 8(84):5329, 2023. DOI: 10.21105/joss.05329.