Provable bounds for noise-free expectation values computed from noisy samples

Samantha V. Barron IBM Quantum, IBM T.J. Watson Research Center Daniel J. Egger IBM Quantum, IBM Research Europe - Zurich Elijah Pelofske Los Alamos National Laboratory Andreas Bärtschi Los Alamos National Laboratory Stephan Eidenbenz Los Alamos National Laboratory Matthis Lehmkuehler University of Basel Stefan Woerner [email protected] IBM Quantum, IBM Research Europe - Zurich

(December 1, 2023)

Abstract

In this paper, we explore the impact of noise on quantum computing, particularly focusing on the challenges when sampling bit strings from noisy quantum computers as well as the implications for optimization and machine learning applications. We formally quantify the sampling overhead to extract good samples from noisy quantum computers and relate it to the layer fidelity, a metric to determine the performance of noisy quantum processors. Further, we show how this allows us to use the Conditional Value at Risk of noisy samples to determine provable bounds on noise-free expectation values. We discuss how to leverage these bounds for different algorithms and demonstrate our findings through experiments on a real quantum computer involving up to 127 qubits. The results show a strong alignment with theoretical predictions.

I Introduction

Quantum computing is a new computational paradigm which promises to impact many disciplines, ranging from quantum chemistry peruzzo_2014_vqe ; ollitrault_2021_dynamics , quantum physics dimeglio2023quantum , and material sciences barkoutsos_2021_alchemical , to machine learning Havlicek2019 ; Zoufal_2019_qgan ; Zoufal_2021_varqbm , optimization farhi_2014_qaoa ; Bravyi2019 ; egger2021warm ; Sack2023 , and finance Woerner_2019_risk ; yndurain_2019_quantum_finance ; Stamatopoulos_2022_market_risk . However, leveraging near-term quantum computers is difficult due to the noise present in the systems. Ultimately, this needs to be addressed by quantum error correction, which exponentially suppresses errors by encoding logical qubits in multiple physical qubits nielsen_and_chuang ; lidar_brun_2013_qec .

In near-term devices, implementing error correction is infeasible. We must find other ways to handle the noise. A promising approach to bridge the gap between noisy and error-corrected quantum computing is error mitigation. Here, we leverage multiple noisy estimates to construct a better approximation of the noise-free result. The most prominent examples are Probabilistic Error Cancellation (PEC) berg2023probabilistic ; Piveteau_2022 and Zero Noise Extrapolation (ZNE) Temme_2017 . While error mitigation in general scales exponentially quek2023exponentially , a combination of PEC and ZNE has been impressively demonstrated recently in a 127-qubit experiment at a circuit depth beyond the reach of exact classical methods kim_2023_utility ; anand2023classical . The rate of the exponential cost of error mitigation directly relates to the errors in the quantum devices. It is expected that these errors can be reduced to a level that noisy devices with error mitigation can already perform practically relevant tasks even before error correction Bravyi_2022 . PEC and ZNE mitigate the errors in expectation values. While this finds many applications, e.g., in quantum chemistry and physics, most quantum optimization farhi_2014_qaoa ; egger2021warm ; zoufal_2023_blackbox and many quantum machine learning algorithms Zoufal_2019_qgan ; letcher2023tight build directly on top of measured samples from a quantum computer. In optimization, having access to an objective value but not the samples corresponds to knowing the value of an optimal solution but not how to realize it. Getting these samples is thus a key problem to scale sample-based algorithms on noisy hardware.

In this paper, we discuss the impact of noise on sampling bit strings from a noisy quantum computer and quantify the sampling overhead required to extract good solutions from noisy devices, e.g., in the context of optimization. Furthermore, we connect our findings to the Conditional Value at Risk (CVaR, also known as Expected Shortfall), an alternative loss function introduced in Ref. barkoutsos_2020_cvar . We show that CVaR is robust against noise and can generate meaningful results from noisy samples also for expectation values. This feature was already conjectured in Ref. barkoutsos_2020_cvar but has not been shown formally. Our work closes this gap and shows that CVaR evaluated on noisy samples achieves provable bounds on noise-free observables. We demonstrate these bounds on up to 127-qubits on a real quantum computer applied to optimization problems, where we find close agreement between the experiments and theory. In particular, this allows us to apply the known noise-free performance bounds for the Quantum Approximate Optimization Algorithm (QAOA) for MAXCUT on 3-regular graphs farhi_2014_qaoa ; wurtz_2021_qaoa . Thus, our work thus results in provable performance guarantees for a variational algorithm even on noisy hardware.

The remainder of this paper is organized as follows. First, Sec. II discusses the impact of noise on sampling and how to quantify it. Then, Sec. III formally defines the CVaR and shows that it can provide provable bounds to noise-free expectation values from noisy samples. Afterwards, Sec. IV discuses the implications of the presented results in the context of applications in optimization, machine learning, and quantum time evolution. Sec. V demonstrates the results on a real quantum computer up to 127-qubits where we find close agreement with the theory. Last, Sec. VI concludes the paper and we discuss open questions for further research.

II Sampling from Noisy Quantum Computers

Suppose an initial $n$ -qubit quantum state $\rho_{0}$ , a quantum operation $\mathcal{U}(\cdot)=U\cdot U^{\dagger}$ , and the resulting $\rho=\mathcal{U}(\rho_{0})$ . On a real quantum computer, we usually do not have access to the ideal operation $\mathcal{U}$ but only to a noisy version $\widetilde{\mathcal{U}}$ which we model by $\mathcal{U}\circ\Lambda$ . Here, $\Lambda$ denotes the noise model. We denote the resulting noisy state by $\widetilde{\rho}=\widetilde{\mathcal{U}}(\rho_{0})$ .

For simplicity, we assume the Pauli-Lindblad noise model introduced in Ref. berg2023probabilistic

\displaystyle\Lambda(\rho)=\prod_{k\in\mathcal{K}}\left(w_{k}\,(\cdot)+(1-w_{k% })P_{k}(\cdot)P_{k}\right)\rho.

(1)

Here, $\mathcal{K}$ denotes the index set for (local) Pauli error terms $P_{k}$ , and $w_{k}=(1+e^{-2\lambda_{k}})/2$ for corresponding model coefficients $\lambda_{k}$ that determine the strength of the noise. The assumption of Pauli noise can usually be justified via Pauli twirling knill_randomized_2008 ; dankert_exact_2009 ; magesan_scalable_2011 . In Appendix A we discuss Pauli twirling and the assumption of a Pauli noise model in more detail.

In general, a quantum circuit is not a single operation $\mathcal{U}$ but a concatenation of layers $\mathcal{U}_{i}$ , $i=1,\ldots,l$ . Their noisy versions are $\widetilde{\mathcal{U}}_{i}$ with corresponding noise models $\Lambda_{i}$ . Crucially, this allows us to learn the noise model for each layer independently berg2023probabilistic . A common assumption is that the layers $\mathcal{U}_{i}$ consist of non-overlap** CNOT gates (or other hardware-native two-qubit Clifford gates) and that these layers are possibly alternating with layers of single qubit gates. Single qubit gates are assumed to be noise-free since their errors are an order of magnitude smaller than those of two-qubit gates. Therefore, only the noise of the two-qubit gate layers is considered.

Assuming the above layer structure and that the noise model of the quantum processor is sparse allows Ref. berg2023probabilistic to introduce a protocol to efficiently learn the model coefficients $\lambda_{k}$ . A property of $\Lambda$ that characterizes the overall strength of the noise is $\gamma=e^{2\sum_{k}\lambda_{k}}$ . This has a direct operational interpretation, since $\gamma^{2}$ defines the sampling overhead of applying PEC to mitigate the noise in the context of estimating an expectation value Temme_2017 ; berg2023probabilistic .

Here, we first focus on sampling from noisy quantum computers instead of estimating expectation values. Suppose we prepare a quantum state and afterwards measure the qubits. Then, the probability to sample a bit string $x\in\{0,1\}^{n}$ is given by $p_{x}=\operatorname{tr}(\rho|x\rangle\!\langle x|)$ for the noise-free state $\rho$ and by $\widetilde{p}_{x}=\operatorname{tr}(\widetilde{\rho}|x\rangle\!\langle x|)$ for the noisy state $\widetilde{\rho}$ . The noise model introduced in Eq. (1) can also be interpreted as follows: with a probability of $1/\sqrt{\gamma}=\prod_{k}w_{k}$ we sample a bit-string from $\rho$ and with probability $1-1/\sqrt{\gamma}$ we sample from a state where at least one error occurred. Here, we assume $\lambda_{k}\ll 1$ such that we can leverage $e^{x}=1+x+\mathcal{O}(x^{2})$ . It immediately follows that $w_{k}=e^{-\lambda_{k}}+\mathcal{O}(\lambda_{k}^{2})$ , and thus, $1/\sqrt{\gamma}=\prod_{k}w_{k}$ . Then, the law of total probability kokosaka_2000_probability implies the lower bound:

\displaystyle\widetilde{p}_{x}

\displaystyle\geq

\displaystyle p_{x}/\sqrt{\gamma}.

(2)

In other words, if a noise-free state $\rho$ has probability $p_{x}$ to sample a bit string of interest $x$ , then, if $\rho$ is approximated by $\widetilde{\rho}$ prepared through a noisy process characterized by $\gamma$ , we need a multiplicative sampling overhead of $\sqrt{\gamma}$ to guarantee at least the same probability of sampling $x$ as for the noise-free state. Thus, as long as we are only interested in generating relevant bit strings that we can efficiently evaluate classically, we can deal with the noise by measuring $\sqrt{\gamma}$ -times more often. This is in contrast to the multiplicative sampling overhead $\gamma^{2}$ introduced by PEC when we are interested in estimating expectation values. Interestingly, if we apply PEC and then determine only the sampling probabilities, without evaluating an expectation value, we find that the sampling probabilities are lower bounded by $p_{x}/\gamma$ , i.e., PEC “amplifies” the noise to achieve an unbiased estimation of expectation values, see Appendix B for more details.

The sampling overhead $\sqrt{\gamma}$ can be derived from the noise model resulting from the noise learning protocol introduced in Ref. berg2023probabilistic . However, in the present context, we are not interested in the full description of the noise model, only in $\gamma$ . Recently, Ref. mckay2023benchmarking introduced the Layer Fidelity (LF), a metric to measure noise present in the hardware when executing a circuit. The LF also assumes the layered gate structure mentioned above and determines the resulting fidelity for each layer of gates. It has a direct connection to the sampling overhead via $\text{LF}_{i}=1/\sqrt{\gamma_{i}}$ , where $\gamma_{i}$ characterizes the noise of layer $i$ . For multiple layers we can thus rewrite Eq. (2) as

\displaystyle\widetilde{p}_{x}

\displaystyle\geq

\displaystyle p_{x}\prod_{i}\text{LF}_{i}.

(3)

Further, the LF has the advantage that it is very cheap to evaluate compared to learning to full noise model. Thus, for a given circuit, the LF allows us to efficiently determine the sampling overhead to compensate the noise.

Other types of errors that we have not mentioned so far are state preparation and measurement (SPAM) errors. In principle, we can also determine a sampling overhead and compensate for the SPAM errors by increasing the number of samples. However, particularly for measurement errors, there exists other protocols which might allow for statistical corrections with a smaller sampling overhead van_den_Berg_2022_trex ; Nation_2021_m3 . A systematic study of these types of errors would be interesting for future research.

III Conditional Value-at-Risk

Section II shows that we can sample bit strings of interest $x$ , i.e., corresponding to the noise-free state $\rho$ , by taking $\sqrt{\gamma}$ -times more samples from the noisy state $\widetilde{\rho}$ . However, we usually do not know which samples correspond to the noise-free state and which samples were affected by noise. We now leverage the insight of Sec. II and show that the CVaR can provide provable bounds to noise-free expectation values from noisy samples. The CVaR has already been suggested as a loss function and observable in Ref. barkoutsos_2020_cvar , however, only based on intuition and without theoretical justification.

Consider an integrable real-valued random variable $X$ with cumulative distribution function $F_{X}:\mathbb{R}\rightarrow[0,1]$ . Then, the (lower) CVaR at level $\alpha\in(0,1]$ is defined as

	$\displaystyle\operatorname{CVaR}_{\alpha}(X)$	$\displaystyle=\alpha^{-1}\mathbb{E}[X;X\leq x_{\alpha}]$
		$\displaystyle\qquad+x_{\alpha}(1-\alpha^{-1}\mathbb{P}[X\leq x_{\alpha}])\,,$

where $x_{\alpha}=\inf\{x\in\mathbb{R}\colon F_{X}(x)\geq\alpha\}$ . In the case when $F_{X}(x_{\alpha})=\alpha$ , this definition simplifies to $\operatorname{CVaR}_{\alpha}(X)=\mathbb{E}[X\mid X\leq x_{\alpha}]$ , i.e. we are considering the expectation of $X$ when we are conditioning $X$ to take values in its bottom $\alpha$ quantile. Accordingly, we define the upper CVaR as

\displaystyle\overline{\operatorname{CVaR}}_{\alpha}(X)=-\operatorname{CVaR}_{% \alpha}(-X)\,.

(4)

Therefore we are considering the expectation of $X$ conditioned on values in its upper $\alpha$ quantile. This allows us to prove the following lemma.

Lemma 1.

Suppose a random variable $X$ with probabilities $p_{x}=\mathbb{P}[X=x]$ for $x\in\mathbb{R}$ . Further, suppose another random variable $\widetilde{X}$ as well as a given constant $C\geq 1$ such that $\widetilde{p}_{x}=\mathbb{P}[\widetilde{X}=x]\geq p_{x}/C$ . Then we have

\displaystyle\operatorname{CVaR}_{\alpha}(\widetilde{X})

\displaystyle\leq\mathbb{E}[X]\leq

\displaystyle\overline{\operatorname{CVaR}}_{\alpha}(\widetilde{X})\,,

(5)

for all $\alpha\leq 1/C$ . Thus, the lower and upper CVaR of $\widetilde{X}$ with $\alpha\leq 1/C$ define lower and upper bounds, respectively, of the expectation value of $X$ .

Proof.

By monotonicity of $\operatorname{CVaR}_{\alpha}(\widetilde{X})$ in $\alpha$ , it suffices to show the claim for $\alpha=1/C$ . Let $x_{1}<\cdots<x_{n}$ denote the support of $\widetilde{p}$ . Take $k\leq n$ such that $\sum_{i\leq k-1}\widetilde{p}_{x_{i}}<1/C\leq\sum_{i\leq k}\widetilde{p}_{x_{i}}$ , then

\displaystyle\operatorname{CVaR}_{1/C}(\widetilde{X})=C\sum_{i\leq k}x_{i}% \widetilde{p}_{x_{i}}+x_{k}\left(1-C\sum_{i\leq k}\widetilde{p}_{x_{i}}\right)\,.

Clearly, the $p$ minimizing $\mathbb{E}[X]=\sum_{x}xp_{x}$ and satisfying $p_{x}\leq C\widetilde{p}_{x}$ for all $x$ is also supported on $\{x_{1},\ldots,x_{n}\}$ and satisfies

	$\displaystyle p_{x_{i}}$	$\displaystyle=C\widetilde{p}_{x_{i}}\text{ for all $i<k$, and}$
	$\displaystyle p_{x_{k}}$	$\displaystyle\leq 1-\sum_{i<k}p_{x_{i}}=1-C\sum_{i<k}\widetilde{p}_{x_{i}}$

From this, the claim is immediate by using the above to lower bound $\mathbb{E}[X]$ . The upper bound follows by applying the lower bound to $-X$ and $-\widetilde{X}$ in place of $X$ and $\widetilde{X}$ . ∎

Next, let us consider again a noise-free $n$ -qubit quantum state $\rho$ , its noisy version $\widetilde{\rho}$ , and the corresponding $\gamma$ . Further, suppose a diagonal Hamiltonian $H$ , which can also be interpreted as a function $h:\{0,1\}^{n}\rightarrow\mathbb{R}$ . Let us define the random variables $X,\widetilde{X}\in\{0,1\}^{n}$ , as the result of measuring $\rho$ and $\widetilde{\rho}$ , respectively. Then, Lemma 1 and Eq. (2) immediately imply

\displaystyle\operatorname{CVaR}_{\alpha}(h(\widetilde{X}))

\displaystyle\leq\mathbb{E}[h(X)]\leq

\displaystyle\overline{\operatorname{CVaR}}_{\alpha}(h(\widetilde{X}))\,,

(6)

for all $\alpha\leq 1/\sqrt{\gamma}$ . Since, for a diagonal $H$ we have $\operatorname{tr}(\rho H)=\mathbb{E}[h(X)]$ , Eq. (6) implies that the lower/upper CVaR computed from the noisy samples $\rho$ provide lower/upper bounds for the noise-free expectation value of $\rho$ . Further, suppose $\rho$ is the ground state of the diagonal $H$ . Then, $h(\widetilde{X})$ cannot achieve any values smaller than $\operatorname{tr}(\rho H)$ and the left inequality in Eq. (6) is an equality. Thus, the noisy lower CVaR is equal to the ground state energy (similarly for the upper CVaR if $\rho$ would correspond to the maximally excited state of $H$ ). Further, we also know that if the noisy CVaR would equal the ground state energy, the fidelity between the noise-free state $\rho$ and the noisy state $\widetilde{\rho}$ is lower bounded by the considered $\alpha$ , i.e., $\mathcal{F}(\rho,\widetilde{\rho})\geq\alpha$ .

Diagonal Hamiltonians arise, e.g., in optimization problems or in the form of projectors $|x\rangle\!\langle x|$ , as can be used, e.g., for fidelity estimations. We will discuss these applications in more detail in Sec. IV.1 and Sec. IV.2. However, many applications also involve non-diagonal Hamiltonians, most prominently applications in quantum chemistry and physics peruzzo_2014_vqe . Suppose a non-diagonal Hamiltonian $H=\sum_{i}c_{i}P_{i}$ , where $P_{i}$ denote Pauli terms and $c_{i}$ the corresponding weights. Then, we can decompose $H$ into a sum of Hamiltonians consisting of subsets of commuting Pauli strings $H=\sum_{j}H_{j}$ . All Pauli terms in $H_{j}$ can be simultaneously diagonalized via single qubit Pauli rotations. Thus, we can assume the $H_{j}$ are diagonal without loss of generality. We define the corresponding functions $h_{j}:\{0,1\}^{n}\rightarrow\mathbb{R}$ as well as noise-free and noisy random variables $X_{j},\widetilde{X}_{j}$ , respectively, resulting from measuring the quantum states with the corresponding post-rotations to diagonalize the Hamiltonians $H_{j}$ . This implies

	$\displaystyle\sum_{j}\operatorname{CVaR}_{\alpha}(h_{j}(\widetilde{X}_{j}))$	$\displaystyle\leq$	$\displaystyle\operatorname{tr}(\rho H)$		(7)
		$\displaystyle\leq$	$\displaystyle\sum_{j}\overline{\operatorname{CVaR}}_{\alpha}(h_{j}(\widetilde{% X}_{j}))\,,$		(7)

for all $\alpha\leq 1/\sqrt{\gamma}$ , which extends the previous result to non-diagonal Hamiltonians. Note that in contrast to diagonal Hamiltonians, we cannot draw conclusions anymore about the groundstate energy or the fidelity between noisy state and groundstate. For instance, the lower bound in Eq. (7) can be strictly smaller then the groundstate energy.

The CVaR can be estimated using Monte Carlo sampling. The variance of this estimator depends on the type of distribution considered but is always bounded by $\mathcal{O}(1/\alpha^{2})$ . However, for instance, for Normal and Bernoulli distributions it can even be shown that in the present context the analytic behavior of the variances of CVaR for $\alpha\rightarrow 0$ is $\mathcal{O}(1/\alpha)$ , where for Bernoulli, we assume that the success probability $p$ satisfies $p=\mathcal{O}(1/\sqrt{\gamma})$ , which is the relevant case for the applications we consider later on, cf. Sec. IV.2. The derivation for the variance bounds for CVaR estimation are provided in Appendix C. Thus, in these cases and for $\alpha=1/\sqrt{\gamma}$ , the variance increases as $\mathcal{O}(\sqrt{\gamma})$ . This renders the CVaR a very promising noise-robust loss function for variational quantum algorithms. The variance is amplified significantly less than for PEC, where it increases as $\mathcal{O}(\gamma^{2})$ . However, we need to recall that PEC comes with much stronger theoretical guarantees, i.e., provides an unbiased estimator instead of a bound. Thus, depending on the application, CVaR might not be applicable.

In the remainder of this section we discuss improvements to the lower and upper bounds for cases where we have more information about the noise-free state. I.e, properties that the bit strings measured from the noise-free state must have but that might not persist under noise. Examples of such properties are particle preservation in quantum chemistry Bonet_Monroig_2018_post_selection ; Choquette_2021 and constraints satisfaction in quantum optimization barkoutsos_2020_cvar .

Suppose a function $\mathcal{F}:\{0,1\}^{n}\rightarrow\{0,1\}$ that determines whether a bit string $x$ has a required property. Here, $\mathcal{F}(x)=1$ indicates the presence of the property. Further, suppose a given Hamiltonian $H$ and, for simplicity, let us assume it is diagonal and defined by a function $h:\{0,1\}^{n}\rightarrow\mathbb{R}$ . From this, we can construct a modified Hamiltonian $H_{\mathcal{F}}^{M}$ defined by the function

\displaystyle h_{\mathcal{F}}^{M}(x)

\displaystyle=

\displaystyle\begin{cases}h(x)&\text{if }\mathcal{F}(x)=1,\\ M&\text{otherwise,}\end{cases}

(8)

where $M$ is a given constant. We thus have $\operatorname{tr}(\rho H)=\operatorname{tr}(\rho H_{\mathcal{F}}^{M})$ in the noise-free case for any $M$ , since all noise-free samples $x$ satisfy $\mathcal{F}(x)=1$ . Next, we assume constants $M_{l}$ and $M_{u}$ that satisfy $M_{l}\leq h(x)\leq M_{u}$ for all $x$ with $\mathcal{F}(x)=1$ . Samples with $\mathcal{F}(x)=0$ must be affected by noise, which allows us to filter out samples where the noise destroys the required property. Although there might still be noisy samples that are feasible, the post-selection reduces the impact of noise. Due to the equality of expectation values in the noise-free case and the choice of $M_{l}$ and $M_{u}$ , we immediately get

\displaystyle\operatorname{CVaR}_{\alpha}(h_{\mathcal{F}}^{M_{u}}(\widetilde{X% }))

\displaystyle\leq\mathbb{E}[X]\leq

\displaystyle\overline{\operatorname{CVaR}}_{\alpha}(h_{\mathcal{F}}^{M_{l}}(% \widetilde{X})),

(9)

for all $\alpha\leq 1/\sqrt{\gamma}$ . This can lead to significantly better bounds since we can leverage the additional information about the considered problem to filter out more noisy samples. For non-diagonal Hamiltonians, see Eq. (7), it is possible to define a filter function $\mathcal{F}_{j}$ for each $H_{j}$ .

Another implication of our results is that the average over the post-selected noisy samples must lie between the lower and upper bounds resulting from the filtered CVaR due to the monotonicity of CVaR with respect to $\alpha$ . Thus, the CVaR allows to bound the bias that post-selection may introduce and provide a quality measure for the estimated expectation value.

IV Applications

We now discuss the presented theory on sampling probabilities and CVaR in the context of different applications: first, quantum optimization farhi_2014_qaoa ; barkoutsos_2020_cvar ; egger2021warm ; zoufal_2023_blackbox ; weidenfeller2022scaling , and second, fidelity-based algorithms, such as Quantum Support Vector Machines (QSVM) Havlicek2019 ; gentinetta2022complexity ; gentinetta2023quantum as well as Variational Quantum Time Evolution (VarQTE) McArdle_2019_varqte ; Yuan_2019_varqte ; Zoufal_2021_varqbm ; Zoufal_2023_varqte_error_bounds ; Gacon_2021_qnspsa ; gacon2023stochastic ; gacon2023variational . These are illustrative examples, the theory presented here is applicable to many other domains, such as quantum chemistry and physics.

IV.1 (Variational) Quantum Optimization

Many variational quantum algorithms have been proposed to solve discrete optimization problems, such as Quadratic Unconstrained Binary Optimization (QUBO). Most of them have a similar structure and interpret every measured bit string as a potential solution to the problem. Proposals that derive variable values from expectation values Bravyi2019 ; fuller2021approximate ; teramoto2023quantumrelaxation ; patti2022variational are, however, not in the focus of our work.

Suppose a generic unconstrained binary optimization problem of the form

\displaystyle\min_{x\in\{0,1\}^{n}}f(x)\,,

(10)

where $f:\{0,1\}^{n}\mapsto\mathbb{R}$ is an objective function on $n$ binary variables. For instance, a QUBO has $f(x)=x^{T}Qx$ with $Q\in\mathbb{R}^{n\times n}$ . In case of QUBO, we can apply a change of variables $x_{i}=(1-z_{i})/2$ for $z_{i}\in\{-1,+1\}$ and replace $z_{i}$ by the Pauli $Z_{i}$ matrix on qubit $i$ and products $z_{i}z_{j}$ by $Z_{i}\otimes Z_{j}$ to define a diagonal Hamiltonian $H$ and translate Eq. (10) into a ground state problem lucas_2014_ising

\displaystyle\min_{\ket{\psi}}\braket{\psi}{H}{\psi}\,.

(11)

As mentioned in Sec. III, we can transform any generic function to a Hamiltonian where $f(x)$ defines the diagonal element of $H$ at the position of the computational basis state $\ket{x}$ zoufal_2023_blackbox .

Most variational quantum algorithms for binary optimization are defined via a parameterized ansatz $\ket{\psi(\theta)}$ with parameters $\theta\in\mathbb{R}^{d}$ , a loss function $\mathcal{L}(\theta)$ that maps parameter values to a loss value, and an optimizer to solve

\displaystyle\min_{\theta\in\mathbb{R}^{d}}\mathcal{L}(\theta).

(12)

After the final parameters $\theta^{*}$ are determined, the resulting state $\ket{\psi(\theta^{*})}$ is measured and the sampled bit strings are used as potential solutions to the problem. Samples obtained during the execution of the algorithm can also be considered as solutions in case they achieve better objective values than the final samples.

If we set $\mathcal{L}(\theta)=\braket{\psi(\theta)}{H}{\psi(\theta)}$ for some ansatz $\ket{\psi(\theta)}$ , we get the Variational Quantum Eigensolver (VQE) peruzzo_2014_vqe . Further, if we define the ansatz as

\displaystyle\ket{\psi(\theta)}

\displaystyle=

\displaystyle\prod_{j=1}^{p}e^{-iH_{X}\beta_{j}}e^{-iH\gamma_{j}}\ket{+},

(13)

we get the QAOA farhi_2014_qaoa , where $p$ defines the depth, $\beta_{j},\gamma_{j}\in\mathbb{R}$ are the variational parameters, and $H_{X}=-\sum_{i=1}^{n}X_{i}$ , where $X_{i}$ denotes the Pauli $X$ matrix on qubit $i$ .

The results from Sec. II and III immediately apply to QAOA. Suppose we already have a quantum circuit that, when executed and measured in an ideal noise-free setting, produces good solutions to a considered optimization problem. Sec. II immediately implies that when executed on a noisy devices, a sampling overhead of $\sqrt{\gamma}$ is sufficient to extract solutions of the same quality as in the noise-free case. In certain cases it might be feasible to determine $\theta^{*}$ classically streif2019training ; sack2021quantum and only use the quantum computer to sample good solutions, since evaluating (local) expectation values might be easier than sampling from the full circuit begusic023simulating . However, in cases where we must train the parameterized quantum circuit we can replace the expectation value by the CVaR barkoutsos_2020_cvar . The results introduced in Sec. III now provide guidance on how to choose $\alpha$ and the required sampling overhead to get good results from a noisy device. We illustrate this on concrete examples in Sec. V.2 and Sec. V.1.

Our results allow us to apply proven performance guarantees for QAOA without noise to noisy hardware. For MAXCUT on 3-regular graphs, QAOA achieves a worst-case performance of $0.692$ for $p=1$ farhi_2014_qaoa , $0.7559$ for $p=2$ , and (under certain assumptions) $0.7924$ for $p=3$ wurtz_2021_qaoa . With a $\sqrt{\gamma}$ sampling overhead these guarantees are recovered even in the noisy regime. Furthermore, for 3-regular graphs, we can always train QAOA with $p\leq 3$ classically by simulating at most 30 qubits at a time Sack2023 , i.e., we can determine the optimal parameters via classical simulation and then sample good solutions with a $\sqrt{\gamma}$ overhead from the quantum computer. Since $\gamma$ grows exponentially with the circuit size the sampling overhead introduced to combat noise may exceed the cost of a brute force search. A simple back of the envelope calculation, discussed in Appendix D, determines a minimum layer fidelity require to apply a depth $p$ QAOA.

The Quantum Alternating Operator Ansatz (QAOA’) is an alternative of QAOA hadfield_quantum_2019 . Here, a constraint, e.g., a fixed Hamming weight (i.e., a fixed number of ones in a bit string) is enforced by changing the mixer to preserve such states wang2020xymixers ; cook2020vertexcover ; golden2023numerical and starting in (a superposition of) feasible states baertschi2022shortdepth ; baertschi2020grover . Thus, if QAOA’ is executed noise-free, all resulting samples satisfy the given constraint. This is an example of a filter function $\mathcal{F}$ , as introduced in Sec. III, helps to improve the CVaR bounds on the corresponding expectation value.

IV.2 Fidelities

Several quantum algorithms leverage fidelity estimation between two quantum states in a sub-routine. In the following, we first discuss how to leverage the CVaR bounds to approximate fidelities on noisy quantum computers and then how this impacts two concrete classes of algorithms: QSVMs and VarQTE.

Suppose we have $n$ -qubit quantum circuits $U$ and $V$ that define $\ket{\psi}=U\ket{0}$ and $\ket{\phi}=V\ket{0}$ , respectively. A common approach to estimate the fidelity between $\ket{\psi}$ and $\ket{\phi}$ is the compute-uncompute method given by

\displaystyle\mathcal{F}(\ket{\psi},\ket{\phi})

\displaystyle=

\displaystyle\left|\braket{0}{V^{\dagger}U}{0}\right|^{2}.

(14)

$\mathcal{F}$ is thus the probability of measuring $\ket{0}$ for the state $V^{\dagger}U\ket{0}$ . This also equals the expectation value $\operatorname{tr}(\rho H)$ for the state $\rho=V^{\dagger}U\ket{0}$ and the diagonal Hamiltonian $H=|0\rangle\!\langle 0|$ . Thus, we can use $\overline{\operatorname{CVaR}}$ to get an upper bound of the noise-free fidelity. Here, the resulting random variable follows a Bernoulli distribution, as the expectation value counts the number of measured $\ket{0}$ ’s and ignores all other outcomes. Since the variance of the CVaR for a Bernoulli random variable scales with $1/\alpha$ , see Sec. III, we can set $\alpha=1/\sqrt{\gamma}$ and use Eq. (6) to upper bound the fidelity with a sampling overhead of $\sqrt{\gamma}$ compared to the $\gamma^{2}$ required by PEC to get an unbiased estimation.

QSVMs leverage a quantum feature map to define a quantum kernel and provably outperform classical computers on certain tasks Liu_2021 . The quantum feature map is a parameterized quantum circuit that takes a classical feature vector $x$ as an input to prepare a corresponding quantum state $\ket{\phi(x)}$ . The corresponding quantum kernel is then defined via the Hilbert-Schmidt inner product of $\ket{\phi(x_{1})}$ and $\ket{\phi(x_{2})}$ for two classical data points $x_{1},x_{2}$ from some training set, which equals $\mathcal{F}(\ket{\psi},\ket{\phi})$ , and thus, falls exactly into the case above.

VarQTE for real or imaginary time evolution assumes a given parametrized quantum state $\ket{\psi(\theta)}$ and then projects the exact state evolution to the parameter evolution of the ansatz. This approximates the desired time evolution in the sub-space that the ansatz can represent. The exact projection requires the evaluation of the quantum geometric tensor (QGT) McArdle_2019_varqte ; Yuan_2019_varqte ; Zoufal_2023_varqte_error_bounds . However, that quickly becomes prohibitive as the number of parameters increases. Thus, multiple approximate variants of VarQTE have been proposed that workaround the evaluation of the QGT Gacon_2021_qnspsa ; gacon2023stochastic ; gacon2023variational . Many of these approximations leverage that the Hessian of the fidelity $|\braket{\psi(\theta)}{\psi(\theta+\delta\theta)}|^{2}$ with respect to $\delta\theta$ which is proportional to the QGT of $\ket{\psi(\theta)}$ up to higher order terms. They either use Simultaneous Perturbation Stochastic Approximation (SPSA) to estimate the Hessian from evaluations of the fidelity as approximations of the QGT, or they construct alternative loss functions that directly leverage the mentioned fidelity without constructing an approximate QGT. In all variants, the parameter disturbances $\delta\theta$ are small, which implies fidelities close to one. Thus, this is in the regime where the noisy CVaR is very close to the noise-free expectation value, i.e., the sweet spot of the introduced approximation.

V Experiments

Within this section, we analyze two optimization problems from the literature to demonstrate the theory presented in this paper. In both cases, we run QAOA circuits on ibm_sherbrooke ibm_quantum_devices . First, smaller but deeper circuits, and second, larger but more shallow circuits. We always find a nice agreement between the theory and the experimental results. All results within this section are achieved without twirling the circuits. For a comparison and discussion of twirled and untwirled circuits see Appendix A.

ibm_sherbrooke is a 127 qubit superconducting qubit device with an echoed cross-resonance (ECR) gate as two-qubit gate Sheldon2016 . This gate is equivalent to a CNOT gate up to single-qubit gates and has a clear direction on the hardware. We let the transpiler take care of the map** from CNOT gates to ECR gates and will in the following write about CNOT gates for better readability.

V.1 QAOA for MAXCUT on 3-regular graphs with 40 nodes

Refer to caption — Figure 1: QAOA results on 40-qubits. The curve is the cumulative distributions function resulting from sampling the circuits for a MAXCUT instance executed on *ibm_sherbrooke* for $p=1$ with $10^{5}$ shots (top) and $p=2$ with $10^{7}$ shots (bottom). The vertical lines show the corresponding noisy expectation values (dashed blue), the noise-free expectation values evaluated using light-cone optimized classical simulation (cyan dashed-dotted), the $\overline{\operatorname{CVaR}}_{\alpha_{p}}$ (cyan dotted), and the globally optimal solution equal to $56$ (green solid). The title shows the fitted $\alpha_{p}^{\prime}$ such that the $\overline{\operatorname{CVaR}}_{\alpha_{p}^{\prime}}$ are equal to the noise-free expectation values (i.e. cyan dashed-dotted).

In this section, we examine QAOA for MAXCUT on a random three-regular graph with 40 nodes, i.e., on 40 qubits. We take the problem instance from Ref. Sack2023 and optimize the parameters classically for QAOA with depth $p=1$ and $p=2$ using light-cone simplifications. This allows us to evaluate the required 2-local expectation values by simulating maximally 14 qubits at a time, see details in Ref. Sack2023 . The circuits and optimal parameters are further discussed in Appendix E.

We apply staggered dynamic decoupling for error suppression, as discussed in Appendix F. The circuits are constructed such that they consist of only two different layers of CNOT gates on a line of 40 qubits, denoted by $q_{0},\ldots q_{39}$ . The first layer is composed of 20 CNOT gates on qubits $(q_{i},q_{i+1})$ for $i$ even and the second composed of 19 CNOT gates on $(q_{i},q_{i+1})$ for $i$ odd. Using the technique introduced in Ref. mckay2023benchmarking the measured LF for these two layers is $LF_{1}=0.7686$ and $LF_{2}=0.7444$ , respectively ¹¹1At the time of writing the experiment to measure layer fidelity is under implementation in Qiskit Experiments QiskitExperiments . See https://github.com/Qiskit-Extensions/qiskit-experiments. We take the geometric average over the total number of CNOT gates and derive a CNOT fidelity as $\mathcal{F}_{CX}=(LF_{1}\times LF_{2})^{1/39}=0.9858$ . This also allows us to compute the error per layered gate (EPLG) of Ref. mckay2023benchmarking as $1-\mathcal{F}_{CX}=0.0142$ . We also define $\gamma_{CX}=1/\mathcal{F}_{CX}^{2}=1.0290$ . In total, the circuits for $p=1$ and $p=2$ have 461 and 922 CNOT gates, respectively, all in form of the before mentioned layers. We can thus compute the sampling overhead for $p=1$ and $p=2$ as $\sqrt{\gamma_{1}}=735.0$ and $\sqrt{\gamma_{2}}=540275.9$ , respectively, which corresponds to $\alpha_{1}=1.361\times 10^{-3}$ and $\alpha_{2}=1.851\times 10^{-6}$ , for $p=1$ and $p=2$ , respectively. A regularly measured EPLG evaluated over a chain of 100-qubits is provided for ibm_sherbrooke in the IBM Quantum Platform ibm_quantum_devices . At the time of the experiment the backend reported an EPLG of $0.017$ , which is slightly higher than our measured EPLG. This is expected, since we restrict to 40 qubits. In any case, the EPLG reported by the backend is a good first proxy to estimate the LF and resulting $\gamma$ when executing a particular circuit on a device.

	$p=1$	$p=2$
global optimum	56
$\mathbb{E}[\widetilde{X}]$	30.2	29.9
$\mathbb{E}[X]$	41.5	45.3
$\overline{\operatorname{CVaR}}_{\alpha_{p}}$	43.1	48.5
best sampled value	47	50
number of CNOT gates	461	922
$\sqrt{\gamma_{p}}$	$735.0$	$540275.9$
$\alpha_{p}$	$1.361\times 10^{-3}$	$1.851\times 10^{-6}$
$\alpha_{p}^{\prime}$	$5.180\times 10^{-3}$	$1.071\times 10^{-4}$
$\gamma_{CX}$	1.0290
${\gamma}_{CX,p}^{\prime}$	$1.0231$	$1.0200$

Table 1: QAOA results on 40-qubits: This table shows the different results for

p=1

and

p=2

when running QAOA on the introduced 40-qubit MAXCUT instance. It shows the noisy and noise-free expectation values as well as the CVaR estimates, best sampled values and the global optimal value. Further, it shows the total number of CNOT gates, the overall

\sqrt{\gamma_{p}}

for the circuits, the

\alpha_{p}

derived from the LF as well as the

\alpha_{p}^{\prime}

derived from calibrating the CVaR on the noise-free expectation values, the corresponding

\gamma_{CX}

and

\gamma_{CX,p}^{\prime}

To apply the CVaR bounds, we run the circuits for $p=1$ with $10^{5}$ shots and for $p=2$ with $10^{7}$ shots. This corresponds to 137 and 19 samples that remain to estimate the CVaR after sorting them and kee** the best $\alpha_{1}$ and $\alpha_{2}$ fraction, respectively. The data confirm that $\overline{\operatorname{CVaR}}_{\alpha_{p}}$ provides an upper bound (since MAXCUT is a maximization problem) to the noise-free expectation values, as predicted, see Fig. 1 and Tab. 1. The CVaR upper bound exceeds the noise free value by $3.9\%$ for $p=1$ and by $7.1\%$ for $p=2$ .

We also use the noise-free expectation values obtained from the light-cone simulation to calibrate an $\alpha$ such that the CVaR matches the noise-free result exactly, denoted by $\alpha_{p}^{\prime}$ . This allows us to derive an induced effective $\gamma_{CX,p}^{\prime}$ and compare it to the true $\gamma_{CX}$ . We find that $\gamma_{CX,p}^{\prime}$ is quite stable for the different $p$ and significantly smaller than $\gamma_{CX}$ , see Tab. 1. This may imply that the observable of interest is not affected by all the errors that may occur. Crucially, this observation, may allow us to calibrate $\alpha$ for a particular application and choose larger values than implied by the LF, e.g., by running circuits of similar structure but with known noise-free results. This may reduce the sampling overhead in certain scenarios while still achieving good results. However, in general, the lower/upper bounds proven in Sec. III will not hold anymore for $\alpha>1/\sqrt{\gamma}$ .

Comparing the $\overline{\operatorname{CVaR}}_{\alpha_{p}}$ and the best samples with the globally optimal solution, we find that they achieve approximation ratios of $0.770$ (CVaR) and $0.839$ (best sample) for $p=1$ , and $0.866$ (CVaR) and $0.892$ (best sample) for $p=2$ . All these numbers exceed the corresponding theoretical lower bounds of $0.692$ ( $p=1$ ) and $0.756$ ( $p=2$ ) discussed in Sec. IV.1.

V.2 QAOA on Hardware-efficient Higher-Order Ising Model with 127 variables

We now show results of running QAOA on higher-order spin glass models. Originally described in Refs. pelofske2023qavsqaoa ; pelofske2023short , these models are designed for a heavy-hex connectivity graph Chamberland_2020 of ibm_sherbrooke.

We define a minimization problem for the following cost Hamiltonian corresponding to a random coefficient spin glass problem with cubic terms and a connectivity graph that is defined to be compatible with an arbitrary heavy-hex lattice graph $G=(V,E)$ , see Fig. 2:

	$\displaystyle H=$	$\displaystyle\sum_{v\in V}d_{v}\cdot Z_{v}+\sum_{(i,j)\in E}d_{i,j}\cdot Z_{i}% \otimes Z_{j}$
		$\displaystyle+\sum_{l\in W}d_{l,n_{1}(l),n_{2}(l)}\cdot Z_{l}\otimes Z_{n_{1}(% l)}\otimes Z_{n_{2}(l)}.$		(15)

As $G$ is a connected bipartite graph with vertices $V=\{0,\ldots,n-1\}$ , it is uniquely bipartitioned as $V=V_{2}\sqcup V_{3}$ with $E\subset V_{2}\times V_{3}$ , where $V_{i}$ consists of vertices of degree at most $i$ . With $W\subseteq V_{2}$ in (15), we denote the subset of vertices in $V_{2}$ of degree exactly $2$ . Each node $l$ in $W$ has two neighbors, denoted by $n_{1}(l)$ and $n_{2}(l)$ . Thus $d_{v}$ , $d_{i,j}$ , and $d_{l,n_{1}(l),n_{2}(l)}$ are the coefficients representing the random selection of the linear, quadratic, and cubic coefficients, respectively. The random coefficients are chosen from $\{+1,-1\}$ with equal probability. An example of such a random higher-order Ising model is in Fig. 2.

We use the qubits in $V_{2}$ to compute and uncompute parities into, for the $ZZ$ and $ZZZ$ terms in which they are contained. The unitaries $e^{-i\gamma ZZ}$ and $e^{-i\gamma ZZZ}$ are then realized with $R_{z}(2\gamma)$ -rotations on these parity qubits. Computing and uncomputing parities needs $1+1$ and $2+2$ CNOT gates for the quadratic and cubic terms, respectively; however the CNOT gates for $Z_{l}Z_{n_{1}(l)}$ and $Z_{l}Z_{n_{2}(l)}$ can be subsumed into the CNOT gates for $Z_{l}Z_{n_{1}(l)}Z_{n_{2}(l)}$ .

Furthermore, $G$ as a bipartite graph of maximum degree 3 admits a 3-edge-coloring due to Kőnig’s line coloring theorem, meaning that these $2+2$ CNOT gates can be scheduled simultaneously for all terms in just $3+3$ non-overlap** layers pelofske2023qavsqaoa . Depth- $p$ QAOA circuits for these problems thus have a CNOT depth of only $6p$ , independent of the system size $n$ . Further circuit details are given in Appendix G.

Leveraging parameter transfer of QAOA angles for problems with the same structure but varying numbers of qubits, allows us to obtain good angles for these $127$ qubit QAOA circuits for $p=1,\ldots,5$ , without on-device variational learning heavy_hex_QAOA_parameter_transfer2023 . Additionally, we utilize converged MPS simulations with a bond dimension of $\chi=2048$ to verify that the fixed QAOA angles produce good expectation values heavy_hex_QAOA_parameter_transfer2023 , for all circuits. The hardware-compatible circuits are run on the ibm_sherbrooke device, again using staggered dynamic decoupling for error suppression, see Appendix F. The optimal solutions of the higher order Ising models were computed using CPLEX cplexv12 ; heavy_hex_QAOA_parameter_transfer2023 .

As before in Sec. V.1, we only have a small number of unique layers of CNOT gates. Since we want to cover a graph of degree three, we need at least three layers, see Appendix G, with 144 CNOT gates in total. The measured LF for the three layers is $LF_{1}=0.056926$ , $LF_{2}=0.029630$ and $LF_{3}=0.167959$ . These fidelities are significantly smaller than for the 40 qubit circuits in Sec. V.1. The reason is that the qubits and gates on a 127-qubit devices are not all the same, there are always some better and some worse. For 40 qubits, we could select the best line of 40 qubits (see Appendix E), while for 127-qubits we had to use the whole chip. From this we can again compute CNOT fidelity $\mathcal{F}_{CX}=(LF_{1}\times LF_{2}\times LF_{3})^{1/144}=(0.000283)^{1/144}% =0.944850$ , $EPLG=0.055150$ , and $\gamma_{CX}=1.120146$ . The results for evaluating the circuit on $ibm\_sherbrooke$ , each with $10^{5}$ shots, are provided in Fig. 3 and Tab. 2. With the significantly lower fidelities, the number of shots required to apply the analytic CVaR bounds are significantly higher and currently impractical to run. However, like in Sec. V.1, we see that the effective $\gamma_{CX}$ is significantly smaller, even smaller than for the longer 40-qubit circuits. Further, we see that the noisy expectation values are still improving from $p=1$ until $p=4$ and only are starting to get worse for $p=5$ .

Last, we use bootstrap** to confirm the scaling of the CVaR variance with respect to $\alpha$ . More precisely, we uniformly sample $10^{5}$ values from the results collected using ibm_sherbrooke and estimate the CVaR for the five values of $\alpha_{p}^{\prime}$ reported in Tab. 2. We repeat this $10^{4}$ times to estimate the variance of the resulting CVaR estimators. The results are provided in Fig. 4 and show close agreement with the theory presented in Sec. III.

$p$	$\#\text{CNOT}$	$\operatorname{tr}(\rho H))$	$\operatorname{tr}(\widetilde{\rho}H))$	$f_{\text{best}}$	$\sqrt{\gamma_{p}}$	$\alpha_{p}$	$\alpha_{p}^{\prime}$	$\gamma_{CX,p}^{\prime}$
1	288	-79.79	-64.54	-136	$1.246\times 10^{07}$	$8.026\times 10^{-08}$	$0.4602$	$1.0054$
2	576	-109.35	-81.11	-154	$1.553\times 10^{14}$	$6.441\times 10^{-15}$	$0.1310$	$1.0071$
3	864	-125.37	-86.97	-154	$1.935\times 10^{21}$	$5.169\times 10^{-22}$	$0.0305$	$1.0081$
4	1152	-137.22	-88.46	-156	$2.410\times 10^{28}$	$4.149\times 10^{-29}$	$0.0059$	$1.0090$
5	1440	-145.54	-85.78	-164	$3.003\times 10^{35}$	$3.330\times 10^{-36}$	$0.0011$	$1.0096$

Table 2: QAOA results on 127-qubits: This table shows the different results for

p=1,\ldots,5

when running QAOA on the introduced 127-qubit spin glass instance. It shows the number of CNOT gates per circuit, the noise-free and noisy expectation values, the best sampled values. Further, it shows the overall

\sqrt{\gamma_{p}}

for the circuits and corresponding

\alpha_{p}

derived from the LF as well as the

\gamma_{CX,p}^{\prime}

and

\alpha_{p}^{\prime}

derived from calibrating the CVaR on the noise-free expectation values.

VI Conclusion

We examined how hardware noise affects the quality of bit strings sampled from quantum circuits on noisy quantum computers. We proved and demonstrated that the noise can be compensated by increasing the number of samples inversely proportional to the circuit’s layer fidelity, or equivalently, proportional to $\sqrt{\gamma}$ . This is considerably less than that required for error mitigation strategies like probabilistic error cancellation, which scales as $\gamma^{2}$ , however, to achieve unbiased estimators of expectation values instead of bounds. Furthermore, we proved that the Conditional Value at Risk provides bounds on noise-free expectation values using noisy samples, providing the theoretical foundation for CVaR as a loss function in variational algorithms, and thus, closing a gap in the literature. We also discussed the potential of this theory to benefit other algorithms, such as Quantum Support Vector Machines or approximate Variational Quantum Time Evolution.

Our primary focus was on errors occurring during circuit execution. However, other error sources, notably State Preparation and Measurement (SPAM) errors, also affect performance on noisy devices. The methodologies developed in this paper can be adapted to account for SPAM errors, either by increasing sampling overhead or applying other mitigation techniques, like statistical readout error mitigation. The latter may allow to mitigate certain errors without added sampling overhead but might require additional calibration circuits. Investigating the impact of SPAM errors remains an intriguing direction for future research.

Acknowledgments. The authors want to thank Almudena Carrera Vazquez, Julien Gacon, Youngseok Kim, David McKay, Diego Ristè, David Sutter, Kristan Temme, Minh Tran, and James Wootton for insightful discussions and recommendations to improve the theoretical and experimental results as well as the whole manuscript. Further, M.L. and S.W. acknowledge the support of the Swiss National Science Foundation, SNF grant No. 214919. E.P., A.B., and S.E acknowledge the support of (i) the Beyond Moore’s Law thrust of the Advanced Simulation and Computing Program (NNSA ASC) at Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC, for the National Nuclear Security Administration of U.S. Department of Energy (Contract No. 89233218CNA000001), and (ii) LANL’s Institutional Computing program. LANL report LA-UR-23-33295.

Appendix A Assumption of Pauli noise

Within the theory of the paper we made the simplifying assumption of Pauli noise. This assumption is not given in general. Suppose a Clifford quantum circuit layer $\mathcal{U}(\cdot)=U\cdot U^{\dagger}$ on $n$ qubits and its noisy version $\widetilde{\mathcal{U}}=\mathcal{U}\circ\Lambda$ . A more realistic description of the noise is given by

\displaystyle\Lambda(\rho)

\displaystyle=

\displaystyle\sum_{i}A_{i}\rho A_{i}^{\dagger}\,,

(16)

where the $A_{i}$ are Kraus operators nielsen_and_chuang , which leads to

\displaystyle\widetilde{\mathcal{U}}(\rho)

\displaystyle=

\displaystyle\sum_{i}A_{i}U\rho U^{\dagger}A_{i}^{\dagger}\,.

(17)

Applying Pauli twirling knill_randomized_2008 ; dankert_exact_2009 ; magesan_scalable_2011 , i.e., averaging over $\mathcal{U}$ conjugated by each element of the Pauli group on $n$ qubits yields

\displaystyle\widetilde{\mathcal{U}}_{\text{twirled}}(\rho)

\displaystyle=

\displaystyle\frac{1}{4^{n}}\sum_{i,j}Q_{j}A_{i}UP_{j}\rho P_{j}U^{\dagger}A_{% i}^{\dagger}Q_{j}\,,

(18)

for Paulis $P_{j},Q_{j}$ with $Q_{j}UP_{j}=U$ for all $j=1,\ldots,4^{n}$ . This is known to translate the more general noise given in (17) on average to a Pauli noise model as given in (1). In practice, we do not enumerate all $4^{n}$ Paulis, but uniformly sample from them and apply a certain number of random Paulis to approximate the average.

Suppose now we have a noise model that-on average-looks like Pauli noise. Then, expectation values $\operatorname{tr}(\rho H)$ will have the same value in case of a true Pauli noise model as well as in case of a twirled general model. That also holds if we set $H=|x\rangle\!\langle x|$ , i.e., we evaluate the probability of sampling $\ket{x}$ . However, if we estimate the same sampling probability for the actual Pauli noise model and the twirled noise model the sampling probabilities also must be the same.

For the experiments in Sec. V, we omitted twirling. There are some special cases of noise models where we know the theory holds exactly the same. For instance, suppose stochastic noise wallman2016bounding $A_{0}\sim\sqrt{I}$ and all other $A_{i}$ , for $i>0$ are orthogonal to $A_{0}$ . Then, it can be easily seen that the probability of having no error is equal to the probability of the Pauli noise resulting after twirling, i.e., equal to $1/\sqrt{\gamma}$ . While we can always construct a noise channel with all orthogonal Kraus operators, it is not guaranteed that the identity is part of it. In general, we can only say that the probability of no error in the general noise model is less than or equal to $1/\sqrt{\gamma}$ Wallman_2016 ; wallman2016bounding .

However, it seems that the gap between the twirled and untwirled circuits is very small in the considered cases. We demonstrate the this by comparing the twirled and untwirled cases by comparing the resulting distributions. In Fig. 5 we show the experimental distributions when sampling from the ibm_sherbrooke device the same 127-qubits circuits discussed in Sec. V.2. This shows a close agreement with and without twirling.

We note that the observed distributions in Fig. 5 deviate slightly from those presented in Fig. 3. This is because in order to twirl the circuits, we need to insert additional single qubit gates, which contribute to a slightly deeper circuit, here, about 8% longer in the pulse schedule duration than the original circuits. In some cases this could be reduced by combining the twirling gates with other single qubit gates. However, if the additional gates are inserted, e.g., in between two CNOT gates, this is not possible. The circuits for the untwirled case have the same structure as the twirled case, except that the sampled twirling gates are constant, so that there is a fair comparison between the two due to the additional circuit duration.

We also note that the minimum values of the objective functions for the twirled case are lower than the untwirled case. However, since the opposite is true for the mean value of the objective function, we believe this may be due to sampling statistics, as in each of these cases the minimum objective value was only sampled only once. If we determine $\alpha_{p}^{\prime}$ as before for each case, we find that the twirled and untwirled values agree well for each $p$ , and are well within a standard deviation of each other (determined by bootstrap** the observed bitstrings). This is summarized in Tab. 3.

p	Twirling	$\text{tr}(\rho H)$	$\text{tr}(\widetilde{\rho}H)$	$f_{\text{best}}$	$\alpha_{p}^{\prime}$
1	No	-79.8	-60.8	-128	0.147 (7.9%)
1	Yes	-79.8	-60.9	-144	0.152 (7.7%)
2	No	-109.4	-74.9	-144	0.0202 (22.4%)
2	Yes	-109.4	-72.9	-148	0.0160 (25.7%)

Table 3: Values of the objective function obtained with and without twirling for QAOA depths

p=1,2

, corresponding to the distributions shown in Fig. 5. The noise-free values are the expectation values of the observable obtained using classical MPS simulations (rounded to one decimal place) heavy_hex_QAOA_parameter_transfer2023 . The standard deviations for

\alpha_{p}^{\prime}

(shown as percent of the nominal value) are determined by bootstrap** over the observed bitstrings. We note that the

\alpha_{p}^{\prime}

here are lower than those in Fig. 3 for the same reason that the observed values differ, as described in Appendix A. Nevertheless, the qualitative conclusions still hold, and the twirled and untwirled cases agree well.

Appendix B Probabilistic Error Cancellation & Sampling

In this section, we discuss how applying PEC berg2023probabilistic to quantum circuits affects the resulting sampling probabilities. PEC consists of two steps: learning the noise when running a quantum circuit on a particular quantum device, and then, mitigating the noise to get an unbiased estimator of an expectation value. Here, we assume we have learned the noise already and focus on the error mitigation. Given a noise model $\Lambda$ , PEC constructs a Quasiprobability Decomposition (QPD) to implement the inverse noise by combining multiple weighted quantum circuits.

In a QPD, a quantum operation $\mathcal{U}$ is implemented as a linear combination of other (possibly noisy) operations $\mathcal{E}_{i}$ , $i=1,\ldots,M$ ,

\displaystyle\mathcal{U}(\cdot)

\displaystyle=

\displaystyle\sum_{i=1}^{M}a_{i}\mathcal{E}_{i}(\cdot)\,,

(19)

where $a_{i}\in\mathbb{R}$ , $\mathcal{U}(X)=UXU^{\dagger}$ , $\mathcal{E}_{i}$ denote (noisy) operations, and $\sum_{i=1}^{M}a_{i}=1$ . This has first been proposed in the context of error mitigation Temme_2017 , where $\mathcal{U}$ is assumed to be a noise-free operation and $\mathcal{E}_{i}$ are noisy operations that can be implemented on a noisy device. If this is being applied to multiple gates and qubits, the number of necessary operations $M$ explodes exponentially. Thus, instead of enumerating all of them, one rewrites (19) as

\displaystyle\mathcal{U}(\cdot)

\displaystyle=

\displaystyle\gamma\sum_{i=1}^{M}p_{i}s_{i}\mathcal{E}_{i}(\cdot)\,,

(20)

where $\gamma=\|a\|_{1}\geq 1$ , $p_{i}=|a_{i}|/\gamma$ , and $s_{i}=\text{sign}(a_{i})$ , and samples from the probability distribution defined through $p_{i}$ . Suppose we are interested in estimating $\braket{H}=\operatorname{tr}(\mathcal{U}(\rho)H)$ for some initial state $\rho$ and observable $H$ . Then, we can use the QPD to write

\displaystyle\operatorname{tr}(\mathcal{U}(\rho)H)

\displaystyle=

\displaystyle\gamma\sum_{i=1}^{M}p_{i}s_{i}\operatorname{tr}(\mathcal{E}_{i}(% \rho)H).

(21)

Thus, instead of enumerating all $M$ circuits, we can sample from $p_{i}$ , and only evaluate the sampled circuits corresponding to $i$ , to get an unbiased estimator for $\braket{H}$ . However, the variance of this estimation is amplified by $\gamma^{2}$ , i.e., $\gamma^{2}$ -times more samples are needed than for the original noise-free circuit to achieve an estimate of the same accuracy. The sampling overhead $\gamma^{2}$ grows exponentially in the number of qubits and depth of the circuit, and thus, can be prohibitively large for circuits beyond a certain circuit size and noise levels.

While PEC has only been considered for the estimation of expectation values, it also generates samples from every random circuit that is measured. However, we will show that this essentially amplifies the noise and increases the sampling overhead compared to the results presented within this paper. To this extent, we introduce the following mixed state introduced by PEC:

\displaystyle\rho_{\text{PEC}}

\displaystyle=

\displaystyle\sum_{i=1}^{M}p_{i}\mathcal{E}_{i}(\rho),

(22)

for some initial state $\rho$ . The state $\rho_{\text{PEC}}$ is achieved by drop** the factor $\gamma$ as well as the signs $s_{i}$ from (20). This allows us to state the following lemma.

Lemma 2.

Suppose a $n$ -qubit state $\rho=U|0\rangle\!\langle 0|U^{\dagger}$ , where $U$ is some unitary, with

\displaystyle\operatorname{tr}(\rho|x\rangle\!\langle x|)

\displaystyle=

\displaystyle p_{x}\geq 0,

(23)

for a computational basis state $\ket{x}$ , $x\in\{0,1\}^{n}$ .

Further, suppose that $U$ can be error-mitigated on a noisy device by using PEC with corresponding $\gamma\geq 1$ and denote the resulting mixed state introduced in (22) by $\rho_{\text{PEC}}$ . Then, the probability of measuring $\ket{x}$ on the noisy devices using PEC is lower bounded by

\displaystyle\operatorname{tr}(\rho_{\text{PEC}}|x\rangle\!\langle x|)=p_{x}^{% \text{PEC}}

\displaystyle\geq

\displaystyle p_{x}/\gamma.

(24)

Proof.

Consider the QPD resulting from PEC

\displaystyle\mathcal{U}(\cdot)=\sum_{i=1}^{M}a_{i}\mathcal{E}_{i}(\cdot)\,.

(25)

Using (25) we can write

\displaystyle p_{x}=\operatorname{tr}(\rho|x\rangle\!\langle x|)=\sum_{i=1}^{M% }a_{i}\operatorname{tr}(\mathcal{E}_{i}(|0\rangle\!\langle 0|)|x\rangle\!% \langle x|)\,.

(26)

By defining $\gamma=\|a\|_{1}$ , $p_{i}=|a_{i}|/\gamma$ , and $s_{i}=\text{sign}(a_{i})$ , we can rewrite (26) as

\displaystyle p_{x}=\gamma\sum_{i=1}^{M}p_{i}s_{i}\operatorname{tr}(\mathcal{E% }_{i}(|0\rangle\!\langle 0|)|x\rangle\!\langle x|)\,.

(27)

Further, $s_{i}\operatorname{tr}(\mathcal{E}_{i}(|0\rangle\!\langle 0|)|x\rangle\!% \langle x|)$ allows us to define a random variable $Y_{i}\in\{-1,0,+1\}$ that equals $\pm 1$ if we measure $\mathcal{E}_{i}(|0\rangle\!\langle 0|)$ and obtain $\ket{x}$ , where the sign is determined by $s_{i}$ , and $0$ otherwise. The random variable $Y_{i}$ satisfies $\mathbb{E}[Y_{i}]=s_{i}\operatorname{tr}(\mathcal{E}_{i}(|0\rangle\!\langle 0|% )|x\rangle\!\langle x|)$ . We denote the probabilities of $Y_{i}$ taking the values $-1,0,+1$ by $q_{i}^{-1},q_{i}^{0},q_{i}^{+1}\geq 0$ , respectively. Note that by construction, for each $i$ only one of $q_{i}^{-1},q_{i}^{+1}$ can be larger than zero.

In addition, let the probabilities $p_{i}$ define a random variable $I\in\{1,\ldots,M\}$ . Then, by the law of total expectation, we get

$\displaystyle\gamma\mathbb{E}[Y_{I}]$	$\displaystyle=\gamma\sum_{i=1}^{M}\mathbb{E}[Y_{i}\|i]\mathbb{P}[i]$	(28)
	$\displaystyle=\gamma\sum_{i=1}^{M}p_{i}s_{i}\operatorname{tr}(\mathcal{E}_{i}(% \|0\rangle\!\langle 0\|)\|x\rangle\!\langle x\|)$	(29)
	$\displaystyle=p_{x}\,.$	(30)

This can be rewritten as

\displaystyle\sum_{i=1}^{M}p_{i}\left(q_{i}^{+1}-q_{i}^{-1}\right)

\displaystyle=

\displaystyle\frac{p_{x}}{\gamma}\,.

(31)

The total probability to measure $\ket{x}$ when applying PEC, independent of the sign of $Y_{I}$ , is then given by

\displaystyle\sum_{i=1}^{M}p_{i}\left(q_{i}^{+1}+q_{i}^{-1}\right)

\displaystyle\geq

\displaystyle\frac{p_{x}}{\gamma},

(32)

where the lower bound follows immediately from (31), and the right-hand-side is exactly the probability of measuring $\ket{x}$ for state $\rho_{\text{PEC}}$ . ∎

If we compare the result from Lemma 2 with the lower bound presented in (2), we see that PEC implies the squared overhead compared to direct sampling. Further, this implies that CVaR-based approaches may significantly reduce the overhead to achieve insightful results, particularly when combined with problem structure to filter noisy samples.

Appendix C Variance of Estimating the CVaR

In this section, we present a short exposition on how to estimate CVaR. We will first state the following lemma.

Lemma 3.

Let $X_{1},\dots,X_{n}$ be i.i.d. copies of $X$ (with $X$ integrable) and let $X_{(1)},\dots,X_{(n)}$ be their order statistic. For $\alpha\in(0,1]$ let $E_{n}=(X_{(1)}+\cdots+X_{(\lfloor\alpha n\rfloor)})/\lfloor\alpha n\rfloor$ . Then

\displaystyle\mathbb{E}[E_{n}]\to\operatorname{CVaR}_{\alpha}(X)\quad\text{as % }n\to\infty\,.

If $X$ is square integrable and $F_{X}(x_{\alpha})=\alpha$ ,

\displaystyle\sqrt{n}(E_{n}-\operatorname{CVaR}_{\alpha}(X))\to N(0,% \operatorname{CVaRv}_{\alpha}(X))

in distribution as $n\to\infty$ where here $\operatorname{CVaRv}_{\alpha}(X):=\alpha^{-1}\operatorname{Var}[X\mid X\leq x_% {\alpha}]$ is the limiting variance.

To estimate $\overline{\operatorname{CVaR}}_{\alpha}(X)$ , we use the estimator $\overline{E}_{n}=(X_{(n-\lfloor\alpha n\rfloor+1)}+\cdots+X_{(n)})/\lfloor% \alpha n\rfloor$ and obtain analogous results.

Proof.

Recall $F_{X}(x)=\mathbb{P}[X\leq x]$ and define $F_{X}(x-)=\mathbb{P}[X<x]$ . We make the following definitions for (left limits) of empirical cumulative distribution functions:

	$\displaystyle\hat{F}_{n}(x)$	$\displaystyle=\#\{i\leq n\colon X_{i}\leq x\}/n\,,$
	$\displaystyle\hat{F}_{n}(x-)$	$\displaystyle=\#\{i\leq n\colon X_{i}<x\}/n\,.$

Also let $\Delta F_{X}(x)=F_{X}(x)-F_{X}(x-)$ and $\Delta\hat{F}_{n}(x)=\hat{F}_{n}(x)-\hat{F}_{n}(x-)$ . The key observation is that

\displaystyle E_{n}=\frac{1}{\lfloor\alpha n\rfloor}\sum_{i=1}^{n}X_{i}\min% \left\{\frac{(\lfloor\alpha n\rfloor-n\hat{F}_{n}(X_{i}-))_{+}}{n\Delta\hat{F}% _{n}(X_{i})},1\right\}\,.

Indeed, any $x\in\mathbb{R}$ will appear in the sum defining $\lfloor\alpha n\rfloor E_{n}$ precisely $\min\{(\lfloor\alpha n\rfloor-n\hat{F}_{n}(x-))_{+},n\Delta\hat{F}_{n}(x)\}$ times; the $n\Delta\hat{F}_{n}(X_{i})$ in the denominator above takes care of overcounting. Now

	$\displaystyle\mathbb{E}[E_{n}]=\frac{n}{\lfloor\alpha n\rfloor}\mathbb{E}\left% [X_{1}\min\left\{\frac{(\lfloor\alpha n\rfloor-n\hat{F}_{n}(X_{1}-))_{+}}{n% \Delta\hat{F}_{n}(X_{1})},1\right\}\right]$
	$\displaystyle=\frac{n}{\lfloor\alpha n\rfloor}\mathbb{E}\left[A_{n}(X_{1})\right]$

where

\displaystyle A_{n}(x):=x\mathbb{E}\left[\min\left\{\frac{(\lfloor\alpha n% \rfloor-n\hat{F}_{n-1}(x-))_{+}}{1+n\Delta\hat{F}_{n-1}(x)},1\right\}\right]\,.

The first equality above follows from the linearity of the expectation and the i.i.d. property of $X_{1},\dots,X_{n}$ and the second equality follows from conditioning on $X_{1}$ . Using the strong law of large numbers we have $(\hat{F}_{n}(x),\hat{F}_{n}(x-),\Delta\hat{F}_{n}(x))\to(F_{X}(x),F_{X}(x-),% \Delta F_{X}(x))$ a.s. as $n\to\infty$ . By separately considering the $\Delta F_{X}(x)=0$ and $\Delta F_{X}(x)>0$ cases we get

	$\displaystyle A_{n}(x)$	$\displaystyle\to x\cdot 1(F_{X}(x)<\alpha)$
		$\displaystyle\qquad+x\frac{\alpha-F_{X}(x-)}{\Delta F_{X}(x)}\,1(\alpha\in(F_{% X}(x-),F_{X}(x)))$

as $n\to\infty$ unless $\alpha=F_{X}(x)=F_{X}(x-)$ ; however we have $\mathbb{P}[\alpha=F_{X}(X_{1})=F_{X}(X_{1}-)]=0$ so this case does not matter to evaluate the limit of $\mathbb{E}[E_{n}]$ . Thus by dominated convergence

	$\displaystyle\mathbb{E}[E_{n}]$	$\displaystyle\to\alpha^{-1}\mathbb{E}[X_{1};F_{X}(X_{1})<\alpha]$
		$\displaystyle\qquad+\sum_{x\colon\alpha\in(F_{X}(x-),F_{X}(x))}x(1-\alpha^{-1}% F_{X}(x-))$
		$\displaystyle=\operatorname{CVaR}_{\alpha}(X)$

as $n\to\infty$ . The second claim on the central limit theorem is a special case of cvarestimator_clt . ∎

Let us make the following remark on monotonicity: If $\phi\colon\mathbb{R}\to\mathbb{R}$ is non-decreasing and $\phi(X)$ is integrable, then

	$\displaystyle 0$	$\displaystyle\geq\mathbb{E}[(\phi(X)-\phi(X^{\prime}))(1(X\leq x)-1(X^{\prime}% \leq x))]$
		$\displaystyle=2\cdot\mathbb{E}[\phi(X);X\leq x]-2\cdot\mathbb{E}[\phi(X)]% \mathbb{P}[X\leq x]\,.$

By applying this to $\phi(x)=x$ and $x=x_{\alpha}$ we see that $\operatorname{CVaR}_{\alpha}(X)\leq\mathbb{E}[X]$ . Furthermore, by replacing $X$ by a random variable sampled from the law of $X$ conditioned on $X\leq x_{\alpha^{\prime}}$ for $\alpha^{\prime}>\alpha$ we can deduce that $\operatorname{CVaR}_{\alpha}(X)$ is non-decreasing in $\alpha$ . Much more crudely, we can bound $\operatorname{CVaRv}_{\alpha}(X)\leq\alpha^{-1}\mathbb{E}[X^{2}]/\mathbb{P}[X% \leq x_{\alpha}]\leq\mathbb{E}[X^{2}]/\alpha^{2}$ .

In the following, we analyze behavior of the limiting distribution of the estimator $E_{n}$ in some concrete cases.

In the case where $X$ has a Bernoulli distribution with success probability $p$ , we observe that $\overline{E}_{n}$ has the same distribution as $\min\{B_{n}/\lfloor\alpha n\rfloor,1\}$ where $B_{n}$ is Binomial distributed with parameter $(n,p)$ . An application of the central limit theorem thus yields

	$\displaystyle\sqrt{n}(E_{n}-\min\{p/\alpha,1\})$
	$\displaystyle\to\begin{cases}\alpha^{-1}\sqrt{p(1-p}N&\colon\alpha>p\\ \sqrt{(1-p)p^{-1}}\,N\cdot 1(N\geq 0)&\colon\alpha=p,\\ 0&\colon\alpha<p\,.\end{cases}$

in distribution as $n\to\infty$ where $N$ is a standard normal random variable.

To analyze the case where $N\sim N(0,1)$ , it will be useful to recall the following asymptotic expansion (nist_dlmf, , (8.11(i))) of incomplete Gamma functions:

	$\displaystyle\Gamma(a,y)$	$\displaystyle:=\int_{y}^{\infty}s^{a-1}e^{-s}\,ds$
		$\displaystyle=y^{a-1}e^{-y}\left(\sum_{k=0}^{n-1}\frac{(a-1)\cdots(a-k)}{y^{k}% }+O(y^{-n})\right)$

as $y\to\infty$ for any fixed $n\geq 1$ and $a>0$ . In particular as $x\to\infty$ ,

	$\displaystyle\frac{\Gamma(1/2,x^{2}/2)}{\sqrt{2}}$	$\displaystyle=\frac{e^{-x^{2}/2}}{x}\left(1-\frac{1}{x^{2}}+\frac{3}{x^{4}}+O(% x^{-6})\right),$
	$\displaystyle\frac{\sqrt{2}}{\Gamma(1/2,x^{2}/2)}$	$\displaystyle=xe^{x^{2}/2}\left(1+\frac{1}{x^{2}}-\frac{2}{x^{4}}+O(x^{-6})% \right).$

Let $x_{\alpha}=F_{N}^{-1}(\alpha)$ and write $f_{N}=F_{N}^{\prime}$ for the density of $N$ . By (nist_dlmf, , (7.17(iii))) we get the asymptotic relationship

\displaystyle x_{\alpha}\sim-\sqrt{-\log(4\pi\alpha^{2}\log(1/(2\alpha)))}

as $\alpha\to 0$ . We will compute $\operatorname{CVaR}_{\alpha}(N)$ and $\operatorname{CVaRv}_{\alpha}(N)$ via the cumulant generating function $\phi$ of a truncated Gaussian

	$\displaystyle\phi(\theta)$	$\displaystyle=\log\mathbb{E}[e^{\theta N}\mid N\leq x_{\alpha}]$
		$\displaystyle=-\log F_{N}(x_{\alpha})+\log\int_{-\infty}^{x_{\alpha}}\frac{e^{% -t^{2}/2+\theta t}}{\sqrt{2\pi}}\,dt$
		$\displaystyle=-\log F_{N}(x_{\alpha})+\log\int_{-\infty}^{x_{\alpha}}\frac{e^{% -(t-\theta)^{2}/2+\theta^{2}/2}}{\sqrt{2\pi}}\,dt$
		$\displaystyle=-\log F_{N}(x_{\alpha})+\theta^{2}/2+\log F_{N}(x_{\alpha}-% \theta).$

Differentiating at $\theta=0$ yields the expressions

	$\displaystyle\operatorname{CVaR}_{\alpha}(N)$	$\displaystyle=\phi^{\prime}(0)=-\frac{1}{\alpha}f_{N}(x_{\alpha}),$
	$\displaystyle\alpha\operatorname{CVaRv}_{\alpha}(N)$	$\displaystyle=\phi^{\prime\prime}(0)=1+\frac{1}{\alpha}f_{N}^{\prime}(x_{% \alpha})-\frac{1}{\alpha^{2}}f_{N}(x_{\alpha})^{2}$
		$\displaystyle=1-\frac{1}{\alpha}x_{\alpha}f_{N}(x_{\alpha})-\frac{1}{\alpha^{2% }}f_{N}(x_{\alpha})^{2}.$

Since $\alpha=F_{N}(x_{\alpha})=\Gamma(1/2,x_{\alpha}^{2}/2)/(2\sqrt{\pi})$ , it follows that as $\alpha\to 0$ ,

	$\displaystyle\operatorname{CVaR}_{\alpha}(N)$	$\displaystyle=x_{\alpha}\left(1+\frac{1}{x_{\alpha}^{2}}-\frac{2}{x_{\alpha}^{% 2}}+O(x_{\alpha}^{-6})\right),$
	$\displaystyle\operatorname{CVaRv}_{\alpha}(N)$	$\displaystyle=\frac{1}{\alpha}+\frac{x_{\alpha}^{2}}{\alpha}\left(1+\frac{1}{x% _{\alpha}^{2}}-\frac{2}{x_{\alpha}^{2}}+O(x_{\alpha}^{-6})\right)$
		$\displaystyle\qquad-\frac{x_{\alpha}^{2}}{\alpha}\left(1+\frac{1}{x_{\alpha}^{% 2}}-\frac{2}{x_{\alpha}^{2}}+O(x_{\alpha}^{-6})\right)^{2}$
		$\displaystyle=\frac{1}{\alpha x_{\alpha}^{2}}+O(x_{\alpha}^{-4}).$

As a final example, we can consider the case where $X$ has density $f_{X}(x)=\beta x^{-1-\beta}1(x\geq 1)$ where $\beta>0$ (i.e., we consider a power law tail). Here, one can compute that for $\beta>2$ , $\overline{\operatorname{CVaRv}}_{\alpha}(X)=\beta(\beta-1)^{-2}(\beta-2)^{-1}% \alpha^{-2/\beta}$ which is worse than the decay in the standard normal case and achieves the worst case upper bound on the variance in the $\beta\to 2$ limit.

Appendix D Relation to brute-force search

A brute-force search enumerates all $2^{n}$ candidate solutions and checks which one is optimal. The sampling overhead of $\sqrt{\gamma}$ on noisy devices can thus be related to brute-force search thereby allowing us to derive a hardware requirements for QAOA. Assuming, for simplicity, that the probability $p_{x}$ to sample the optimal solution is close to $1$ we require hardware with $\sqrt{\gamma}<2^{n}$ . We can relate this to the layer fidelity to obtain a requirement on hardware quality necessary for potential quantum advantage. First, we assume that each layer $i$ in a QAOA circuit has the same layer fidelity ${\rm LF}=1/\sqrt{\gamma_{i}}$ . As a result the $\gamma$ of the circuit is $\gamma=\prod_{i=1}^{d(n)}\gamma_{i}=1/{\rm LF}^{2d(n)}$ where $d(n)$ is the depth defined as the number of non-overlap** two-qubit gate layers. This assumption is reasonable when transpiling QAOA circuits to a line of qubits which requires layers of CNOT gates applied on every other edge weidenfeller2022scaling . Therefore, the sampling cost to compensate for noise is $1/{\rm LF}^{d(n)}$ . For a line of qubits we may assume that to leading order $d(n)\sim 3np$ . The factor $3n$ comes from the fact that $n-2$ layers of SWAP gates are needed to implement full connectivity and each SWAP merged with an $R_{ZZ}$ is implemented with three CNOT gates. Here, $p$ is the number of QAOA layers which is sometimes assumed to grow with the logarithm of problem size, i.e., $p\propto\log(n)$ Bravyi2019 ; weidenfeller2022scaling . If the sampling overhead should stay below brute-force search we therefore require ${\rm LF}^{-3np}<2^{n}$ which implies that the layer fidelity must satisfy

\displaystyle{\rm LF}>\frac{1}{2^{1/3p}}.

(33)

This requirement is only dependent on problem size through the relation between $p$ and $n$ . However, as shown in Ref. mckay2023benchmarking the layer fidelity decreases with the number of qubits in the layer. If we further assume that layers are dense, i.e., every layer on $n$ qubits consists of approximately $n/2$ CNOT gates, we can compute a corresponding CNOT fidelity as ${\rm LF}^{2/n}$ , as well as the corresponding lower bound

\displaystyle{\rm LF}^{2/n}>\frac{1}{2^{2/3pn}}.

(34)

Appendix E 40-qubit Circuits

The 40-qubit circuits in the main texts are based on those in Ref. Sack2023 . In this work, the authors consider random three-regular graphs transpiled to a line of qubits using a swap network weidenfeller2022scaling . This results in circuits that alternate only two types of layers of CNOT gates as described in the main text. Furthermore, the authors carefully chose the decision variable to physical qubit map** to minimize the number of layers of the swap network. This method is described in Ref. Matsuo2023 . The code to produce such circuits is available on GitHub BestPractices . The optimal parameters resulting from the light-cone optimization are given by $(\gamma_{1},\beta_{1})=(2.8405,0.3982)$ for $p=1$ and $(\gamma_{1},\beta_{1},\gamma_{2},\beta_{2})=(1.1506,0.3288,0.1941,0.6582)$ for $p=2$ , respectively.

Appendix F Dynamical Decoupling

Dynamical decoupling (DD) removes an interaction between a system and a bath by inserting pulses Viola1998 ; Zanardi1999 ; Vitali1999 . Here, we briefly summarize DD following Ref. Ezzell2023 . Consider a time-independent bath $H_{B}$ interacting with the system $H_{S}=H_{S}^{0}+H_{S}^{1}$ though $H_{SB}$ . Here, $H_{S}^{1}$ is an undesired, always-on error term. The goal of DD is to insert pulses in idle times such the time evolution of the system and bath becomes $U(T)=U_{0}(T)B(T)$ with $U_{0}=\exp(-iTH_{S}^{0})\otimes\mathbb{I}_{B}$ the desired error-free time-evolution and $B(T)$ ideally acts on the bath alone.

Consider a single qubit with $H_{B}+H_{SB}=\sum_{\alpha=0}^{3}\gamma_{\alpha}\sigma^{\alpha}\otimes B^{\alpha}$ . Here, $\gamma_{\alpha}$ is a coefficient, and $B^{\alpha}$ is the bath term that couples to the qubit through the $\sigma^{\alpha}$ Pauli matrix. The simplest DD sequence is ${\rm PX}=X-d_{\tau}-X-d_{\tau}$ where $d_{\tau}$ indicates a delay of duration $\tau$ . Since $X$ anti-commutes with $Y$ and $Z$ , the sequence ${\rm PX}$ cancels the $Y\otimes B^{Y}$ and $Z\otimes B^{Z}$ system-bath interactions. The effective error Hamiltonian after a duration $2\tau$ is $H^{\text{err}}_{\rm PX}=\gamma_{x}X\otimes B^{x}+\mathbb{I}_{s}\otimes\tilde{B% }+\mathcal{O}(\tau^{2})$ . Here, we see that $\rm PX$ is not universal since an $X$ error remains. Universal decoupling up to first-order is achieved with the $XY4$ sequence

\displaystyle{\rm XY4}=Y-d_{\tau}-X-d_{\tau}-Y-d_{\tau}-X-d_{\tau}

(35)

which results in the effective error Hamiltonian $H^{\text{err}}_{\rm XY4}=\mathbb{I}_{s}\otimes\tilde{B}+\mathcal{O}(\tau^{2})$ .

We now consider the two-qubit case. Two fixed-frequency qubits typically exhibit an undesired $ZZ$ -coupling which is effectively suppressed with DD Tripathi2022 ; Mundada2023 . Simultaneously applying the $\rm PX$ sequence on both qubits cancels unwanted errors arising from $\mathbb{I}\otimes Z$ and $Z\otimes\mathbb{I}$ . However, since simultaneous $X$ pulses commute with $Z\otimes Z$ the unwanted $ZZ$ interactions (i.e. cross-talk, which is common in transmon qubits) are still present. This is remedied with staggered DD. We apply the sequence

\displaystyle X_{1}-d_{\tau}-X_{0}-d_{\tau}-X_{1}-d_{\tau}-X_{0}-d_{\tau}

(36)

which staggers two $\rm PX$ sequences. Here, $X_{i}$ is an $X$ gate applied to qubit $i$ , which inverts the evolution of $Z_{i}$ and $Z_{1}\otimes Z_{0}$ . In total, the evolution of single-qubit $Z_{i}$ errors changes sign twice and the evolution of $ZZ$ errors changes sign four times. In this work we apply the staggered XY4 sequence zhou_quantum_2022 (a variant of the staggered XX sequence presented in Mundada2023 ) to ensure a proper cancellation of two-qubit static cross-talk. The staggered XY4 sequence we employ is defined by $Y_{0}-d_{\tau}-Y_{1}-d_{\tau}-X_{0}-d_{\tau}-X_{1}-d_{\tau}-Y_{0}-d_{\tau}-Y_{% 1}-d_{\tau}-X_{0}-d_{\tau}-X_{1}-d_{\tau}$ . As discussed above, it is universal for single-qubit terms and will cancel the static $ZZ$ cross-talk between qubits.

Appendix G 127-qubit QAOA Circuits

In Figure 6, taken from Refs. pelofske2023qavsqaoa ; pelofske2023short , we briefly discuss the optimized circuits for the 127-qubit higher-order instances to have a self-contained description. This illustrates that all 2-qubit gates needed for the implementation of $e^{-i\gamma H}$ can be scheduled in just 3 different layers of non-overlap** CNOT gates. In each QAOA round $p$ , each layer is used once to compute and once to uncompute $ZZ$ and $ZZZ$ parity values, for an overall CNOT depth of $6p$ . The exact values of the heuristically computed, using parameter transfer, QAOA angles that give a strictly increasing expectation value as $p$ increases up to $5$ are given in Ref. heavy_hex_QAOA_parameter_transfer2023 .

References

(1) A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien. A variational eigenvalue solver on a photonic quantum processor. Nature Communications, 5(1), 2014. DOI: 10.1038/ncomms5213.
(2) P. J. Ollitrault, A. Miessen, and I. Tavernelli. Molecular Quantum Dynamics: A Quantum Computing Perspective. Accounts of Chemical Research, 54(23):4229–4238, 2021. DOI: 10.1021/acs.accounts.1c00514. PMID: 34787398.
(3) A. D. Meglio, K. Jansen, I. Tavernelli, C. Alexandrou, S. Arunachalam, C. W. Bauer, K. Borras, S. Carrazza, A. Crippa, V. Croft, R. de Putter, A. Delgado, V. Dunjko, D. J. Egger, E. Fernandez-Combarro, E. Fuchs, L. Funcke, D. Gonzalez-Cuadra, M. Grossi, J. C. Halimeh, Z. Holmes, S. Kuhn, D. Lacroix, R. Lewis, D. Lucchesi, M. L. Martinez, F. Meloni, A. Mezzacapo, S. Montangero, L. Nagano, V. Radescu, E. R. Ortega, A. Roggero, J. Schuhmacher, J. Seixas, P. Silvi, P. Spentzouris, F. Tacchino, K. Temme, K. Terashi, J. Tura, C. Tuysuz, S. Vallecorsa, U.-J. Wiese, S. Yoo, and J. Zhang. Quantum Computing for High-Energy Physics: State of the Art and Challenges. Summary of the QC4HEP Working Group, 2023. DOI: 10.48550/arXiv.2307.03236.
(4) P. K. Barkoutsos, F. Gkritsis, P. J. Ollitrault, I. O. Sokolov, S. Woerner, and I. Tavernelli. Quantum algorithm for alchemical optimization in material design. Chemical Science, 12(12):4345–4352, 2021. DOI: 10.1039/D0SC05718E.
(5) V. Havlicek, A. D. Corcoles, K. Temme, A. W. Harrow, A. Kandala, J. M. Chow, and J. M. Gambetta. Supervised learning with quantum-enhanced feature spaces. Nature, 567:209 – 212, 2019. DOI: 10.1038/s41586-019-0980-2.
(6) C. Zoufal, A. Lucchi, and S. Woerner. Quantum Generative Adversarial Networks for learning and loading random distributions. npj Quantum Information, 5(1), 2019. DOI: 10.1038/s41534-019-0223-2.
(7) C. Zoufal, A. Lucchi, and S. Woerner. Variational quantum Boltzmann machines. Quantum Machine Intelligence, 3(1), 2021. DOI: 10.1007/s42484-020-00033-7.
(8) E. Farhi, J. Goldstone, and S. Gutmann. A Quantum Approximate Optimization Algorithm, 2014. DOI: 10.48550/arXiv.1411.4028.
(9) S. Bravyi, A. Kliesch, R. Koenig, and E. Tang. Obstacles to Variational Quantum Optimization from Symmetry Protection. Physical Review Letters, 125(26):260505, 2020. DOI: 10.1103/PhysRevLett.125.260505.
(10) D. J. Egger, J. Mareček, and S. Woerner. Warm-starting quantum optimization. Quantum, 5:479, 2021. DOI: 10.22331/q-2021-06-17-479.
(11) S. H. Sack and D. J. Egger. Large-scale quantum approximate optimization on non-planar graphs with machine learning noise mitigation, 2023. DOI: 10.48550/arXiv.2307.14427.
(12) S. Woerner and D. J. Egger. Quantum risk analysis. npj Quantum Information, 5(1), 2019. DOI: 10.1038/s41534-019-0130-6.
(13) E. Yndurain, S. Woerner, and D. Egger. Exploring quantum computing use cases for financial services, 2019. Available online: https://www.ibm.com/downloads/cas/2YPRZPB3.[dl:21.11.2023].
(14) N. Stamatopoulos, G. Mazzola, S. Woerner, and W. J. Zeng. Towards quantum advantage in financial market risk using quantum gradient algorithms. Quantum, 6:770, 2022. DOI: 10.22331/q-2022-07-20-770.
(15) M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information: 10th Anniversary Edition. Cambridge University Press, 2011.
(16) D. A. Lidar and T. A. Brun. Quantum Error Correction. Cambridge University Press, 2013. DOI: 10.1017/CBO9781139034807.
(17) E. van den Berg, Z. K. Minev, A. Kandala, and K. Temme. Probabilistic error cancellation with sparse Pauli-Lindblad models on noisy quantum processors. Nature Physics, 19:1116–1121, 2023. DOI: 10.1038/s41567-023-02042-2.
(18) C. Piveteau, D. Sutter, and S. Woerner. Quasiprobability decompositions with reduced sampling overhead. npj Quantum Information, 8(1), 2022. DOI: 10.1038/s41534-022-00517-3.
(19) K. Temme, S. Bravyi, and J. M. Gambetta. Error Mitigation for Short-Depth Quantum Circuits. Physical Review Letters, 119(18), 2017. DOI: 10.1103/physrevlett.119.180509.
(20) Y. Quek, D. S. França, S. Khatri, J. J. Meyer, and J. Eisert. Exponentially tighter bounds on limitations of quantum error mitigation, 2023. DOI: 10.48550/arXiv.2210.11505.
(21) Y. Kim, A. Eddins, S. Anand, K. X. Wei, E. van den Berg, S. Rosenblatt, H. Nayfeh, Y. Wu, M. Zaletel, K. Temme, and A. Kandala. Evidence for the utility of quantum computing before fault tolerance. Nature, 618:500–505, 2023. DOI: 10.1038/s41586-023-06096-3.
(22) S. Anand, K. Temme, A. Kandala, and M. Zaletel. Classical benchmarking of zero noise extrapolation beyond the exactly-verifiable regime, 2023. DOI: 10.48550/arXiv.2306.17839.
(23) S. Bravyi, O. Dial, J. M. Gambetta, D. Gil, and Z. Nazario. The future of quantum computing with superconducting qubits. Journal of Applied Physics, 132(16), 2022. DOI: 10.1063/5.0082975.
(24) C. Zoufal, R. V. Mishmash, N. Sharma, N. Kumar, A. Sheshadri, A. Deshmukh, N. Ibrahim, J. Gacon, and S. Woerner. Variational quantum algorithm for unconstrained black box binary optimization: Application to feature selection. Quantum, 7:909, 2023. DOI: 10.22331/q-2023-01-26-909.
(25) A. Letcher, S. Woerner, and C. Zoufal. From Tight Gradient Bounds for Parameterized Quantum Circuits to the Absence of Barren Plateaus in QGANs, 2023. DOI: 10.48550/arXiv.2309.12681.
(26) P. K. Barkoutsos, G. Nannicini, A. Robert, I. Tavernelli, and S. Woerner. Improving variational quantum optimization using CVaR. Quantum, 4:256, 2020. DOI: 10.22331/q-2020-04-20-256.
(27) J. Wurtz and P. Love. MaxCut quantum approximate optimization algorithm performance guarantees for $p>1$ . Physical Review A, 103(4):042612, 2021. DOI: 10.1103/PhysRevA.103.042612.
(28) E. Knill, D. Leibfried, R. Reichle, J. Britton, R. B. Blakestad, J. D. Jost, C. Langer, R. Ozeri, S. Seidelin, and D. J. Wineland. Randomized benchmarking of quantum gates. Physical Review A, 77(1), 2008. DOI: 10.1103/PhysRevA.77.012307. Publisher: American Physical Society.
(29) C. Dankert, R. Cleve, J. Emerson, and E. Livine. Exact and approximate unitary 2-designs and their application to fidelity estimation. Physical Review A, 80(1):012304, 2009. DOI: 10.1103/PhysRevA.80.012304. Publisher: American Physical Society.
(30) E. Magesan, J. M. Gambetta, and J. Emerson. Scalable and Robust Randomized Benchmarking of Quantum Processes. Physical Review Letters, 106(18):180504, 2011. DOI: 10.1103/PhysRevLett.106.180504. Publisher: American Physical Society.
(31) S. Kokosaka and Z. D. CRC Standard Probability and Statistics Tables and Formulae. CRC Press, 2000. DOI: 10.1201/b16923.
(32) D. C. McKay, I. Hincks, E. J. Pritchett, M. Carroll, L. C. G. Govia, and S. T. Merkel. Benchmarking Quantum Processor Performance at Scale, 2023. DOI: 10.48550/arXiv.2311.05933.
(33) E. van den Berg, Z. K. Minev, and K. Temme. Model-free readout-error mitigation for quantum expectation values. Physical Review A, 105(3), 2022. DOI: 10.1103/physreva.105.032620.
(34) P. D. Nation, H. Kang, N. Sundaresan, and J. M. Gambetta. Scalable Mitigation of Measurement Errors on Quantum Computers. PRX Quantum, 2(4), 2021. DOI: 10.1103/prxquantum.2.040326.
(35) X. Bonet-Monroig, R. Sagastizabal, M. Singh, and T. E. O'Brien. Low-cost error mitigation by symmetry verification. Physical Review A, 98(6), 2018. DOI: 10.1103/physreva.98.062339.
(36) A. Choquette, A. Di Paolo, P. K. Barkoutsos, D. Sénéchal, I. Tavernelli, and A. Blais. Quantum-optimal-control-inspired ansatz for variational quantum algorithms. Physical Review Research, 3(2), 2021. DOI: 10.1103/physrevresearch.3.023092.
(37) J. Weidenfeller, L. C. Valor, J. Gacon, C. Tornow, L. Bello, S. Woerner, and D. J. Egger. Scaling of the quantum approximate optimization algorithm on superconducting qubit based hardware. Quantum, 6:870, 2022. DOI: 10.22331/q-2022-12-07-870.
(38) G. Gentinetta, A. Thomsen, D. Sutter, and S. Woerner. The complexity of quantum support vector machines, 2022. DOI: 10.48550/arXiv.2203.00031.
(39) G. Gentinetta, D. Sutter, C. Zoufal, B. Fuller, and S. Woerner. Quantum Kernel Alignment with Stochastic Gradient Descent, 2023. DOI: 10.48550/arXiv.2304.09899.
(40) S. McArdle, T. Jones, S. Endo, Y. Li, S. C. Benjamin, and X. Yuan. Variational ansatz-based quantum simulation of imaginary time evolution. npj Quantum Information, 5(1), 2019. DOI: 10.1038/s41534-019-0187-2.
(41) X. Yuan, S. Endo, Q. Zhao, Y. Li, and S. C. Benjamin. Theory of variational quantum simulation. Quantum, 3:191, 2019. DOI: 10.22331/q-2019-10-07-191.
(42) C. Zoufal, D. Sutter, and S. Woerner. Error bounds for variational quantum time evolution. Physical Review Applied, 20(4), 2023. DOI: 10.1103/physrevapplied.20.044059.
(43) J. Gacon, C. Zoufal, G. Carleo, and S. Woerner. Simultaneous Perturbation Stochastic Approximation of the Quantum Fisher Information. Quantum, 5:567, 2021. DOI: 10.22331/q-2021-10-20-567.
(44) J. Gacon, C. Zoufal, G. Carleo, and S. Woerner. Stochastic Approximation of Variational Quantum Imaginary Time Evolution, 2023. DOI: 10.48550/arXiv.2305.07059.
(45) J. Gacon, J. Nys, R. Rossi, S. Woerner, and G. Carleo. Variational quantum time evolution without the quantum geometric tensor, 2023. DOI: 10.48550/arXiv.2303.12839.
(46) B. Fuller, C. Hadfield, J. R. Glick, T. Imamichi, T. Itoko, R. J. Thompson, Y. Jiao, M. M. Kagele, A. W. Blom-Schieber, R. Raymond, and A. Mezzacapo. Approximate Solutions of Combinatorial Problems via Quantum Relaxations, 2021. DOI: 10.48550/arXiv.2111.03167.
(47) K. Teramoto, R. Raymond, E. Wakakuwa, and H. Imai. Quantum-Relaxation Based Optimization Algorithms: Theoretical Extensions, 2023. DOI: 10.48550/arXiv.2302.09481.
(48) T. L. Patti, J. Kossaifi, A. Anandkumar, and S. F. Yelin. Variational quantum optimization with multibasis encodings. Physical Review Research, 4(3):033142, 2022. DOI: 10.1103/PhysRevResearch.4.033142.
(49) A. Lucas. Ising formulations of many NP problems. Frontiers in Physics, 2, 2014. DOI: 10.3389/fphy.2014.00005.
(50) M. Streif and M. Leib. Training the quantum approximate optimization algorithm without access to a quantum processing unit. Quantum Science and Technology, 5(3):034008, 2020. DOI: 10.1088/2058-9565/ab8c2b.
(51) S. H. Sack and M. Serbyn. Quantum annealing initialization of the quantum approximate optimization algorithm. Quantum, 5:491, 2021. DOI: 10.22331/q-2021-07-01-491.
(52) T. Begušić, K. Hejazi, and G. K.-L. Chan. Simulating quantum circuit expectation values by clifford perturbation theory, 2023. DOI: 10.48550/arXiv.2306.04797.
(53) S. Hadfield, Z. Wang, B. O’Gorman, E. G. Rieffel, D. Venturelli, and R. Biswas. From the Quantum Approximate Optimization Algorithm to a Quantum Alternating Operator Ansatz. Algorithms, 12(2):34, 2019. DOI: 10.3390/a12020034.
(54) Z. Wang, N. C. Rubin, J. M. Dominy, and E. G. Rieffel. $XY$ mixers: Analytical and numerical results for the quantum alternating operator ansatz. Physical Review A, 101(1):012320, 2020. DOI: 10.1103/PhysRevA.101.012320.
(55) J. Cook, S. Eidenbenz, and A. Bärtschi. The Quantum Alternating Operator Ansatz on Maximum k-Vertex Cover. In IEEE International Conference on Quantum Computing & Engineering QCE’20, pages 83–92, 2020. DOI: 10.1109/QCE49297.2020.00021.
(56) J. Golden, A. Bärtschi, S. Eidenbenz, and D. O’Malley. Numerical Evidence for Exponential Speed-up of QAOA over Unstructured Search for Approximate Constrained Optimization. In IEEE International Conference on Quantum Computing & Engineering QCE’23, pages 496–505, 2023. DOI: 10.1109/QCE57702.2023.00063.
(57) A. Bärtschi and S. Eidenbenz. Short-Depth Circuits for Dicke State Preparation. In IEEE International Conference on Quantum Computing & Engineering QCE’22, pages 87–96, 2022. DOI: 10.1109/QCE53715.2022.00027.
(58) A. Bärtschi and S. Eidenbenz. Grover Mixers for QAOA: Shifting Complexity from Mixer Design to State Preparation. In IEEE International Conference on Quantum Computing & Engineering QCE’20, pages 72–82, 2020. DOI: 10.1109/QCE49297.2020.00020.
(59) Y. Liu, S. Arunachalam, and K. Temme. A rigorous and robust quantum speed-up in supervised machine learning. Nature Physics, 17(9):1013–1017, 2021. DOI: 10.1038/s41567-021-01287-z.
(60) IBM Quantum. IBM Quantum Platform - Compute resources. https://quantum-computing.ibm.com/services/resources, 2023. [Online; accessed 20-Nov-2023].
(61) S. Sheldon, E. Magesan, J. M. Chow, and J. M. Gambetta. Procedure for systematically tuning up cross-talk in the cross-resonance gate. Physical Review A, 93(6):060302, 2016. DOI: 10.1103/PhysRevA.93.060302.
(62) At the time of writing the experiment to measure layer fidelity is under implementation in Qiskit Experiments QiskitExperiments . See https://github.com/Qiskit-Extensions/qiskit-experiments.
(63) E. Pelofske, A. Bärtschi, and S. Eidenbenz. Quantum Annealing vs. QAOA: 127 Qubit Higher-Order Ising Problems on NISQ Computers. In International Conference on High Performance Computing ISC HPC’23, pages 240–258, 2023. DOI: 10.1007/978-3-031-32041-5_13.
(64) E. Pelofske, A. Bärtschi, and S. Eidenbenz. Short-Depth QAOA circuits and Quantum Annealing on Higher-Order Ising Models. npj Quantum Information, 2023. DOI: 10.2172/1985256. Accepted.
(65) C. Chamberland, G. Zhu, T. J. Yoder, J. B. Hertzberg, and A. W. Cross. Topological and subsystem codes on low-degree graphs with flag qubits. Physical Review X, 10(1), 2020. DOI: 10.1103/physrevx.10.011022.
(66) E. Pelofske, A. Bärtschi, L. Cincio, J. Golden, and S. Eidenbenz. Scaling Whole-Chip QAOA for Higher-Order Ising Spin Glass Models on Heavy-Hex Graphs, 2023. LANL report LA-UR-23-33192; to appear.
(67) IBM ILOG CPLEX. V12.10.0: User’s Manual for CPLEX. International Business Machines Corporation, 46(53):157, 2009.
(68) J. J. Wallman. Bounding experimental quantum error rates relative to fault-tolerant thresholds, 2016. DOI: 10.48550/arXiv.1511.00727.
(69) J. J. Wallman and J. Emerson. Noise tailoring for scalable quantum computation via randomized compiling. Physical Review A, 94(5), 2016. DOI: 10.1103/physreva.94.052325.
(70) J. Dedecker and F. Merlevède. Central limit theorem and almost sure results for the empirical estimator of superquantiles/CVaR in the stationary case. Statistics, 56(1):53–72, 2022. DOI: 10.1080/02331888.2022.2043325.
(71) NIST Digital Library of Mathematical Functions. https://dlmf.nist.gov/, Release 1.1.11 of 2023-09-15. Available online: https://dlmf.nist.gov/. F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, B. V. Saunders, H. S. Cohl, and M. A. McClain, eds.
(72) A. Matsuo, S. Yamashita, and D. J. Egger. A SAT Approach to the Initial Map** Problem in SWAP Gate Insertion for Commuting Gates. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E106.A(11):1424–1431, 2023. DOI: 10.1587/transfun.2022eap1159.
(73) Best practices in quantum optimization. Available online: https://github.com/qiskit-community/qopt-best-practices.
(74) L. Viola and S. Lloyd. Dynamical suppression of decoherence in two-state quantum systems. Phyical Review A, 58(4):2733, 1998. DOI: 10.1103/PhysRevA.58.2733.
(75) P. Zanardi. Symmetrizing evolutions. Physics Letters A, 258(2–3):77–82, 1999. DOI: 10.1016/S0375-9601(99)00365-5.
(76) D. Vitali and P. Tombesi. Using parity kicks for decoherence control. Phyical Review A, 59(6):4178, 1999. DOI: 10.1103/PhysRevA.59.4178.
(77) N. Ezzell, B. Pokharel, L. Tewala, G. Quiroz, and D. A. Lidar. Dynamical decoupling for superconducting qubits: a performance survey, 2023. DOI: 10.48550/arXiv.2207.03670.
(78) V. Tripathi, H. Chen, M. Khezri, K.-W. Yip, E. Levenson-Falk, and D. A. Lidar. Suppression of Crosstalk in Superconducting Qubits Using Dynamical Decoupling. Physical Review Appl., 18(2):024068, 2022. DOI: 10.1103/PhysRevApplied.18.024068.
(79) P. S. Mundada, A. Barbosa, S. Maity, Y. Wang, T. Merkh, T. Stace, F. Nielson, A. R. Carvalho, M. Hush, M. J. Biercuk, and Y. Baum. Experimental Benchmarking of an Automated Deterministic Error-Suppression Workflow for Quantum Algorithms. Physical Review Applied, 20(2):024034, 2023. DOI: 10.1103/PhysRevApplied.20.024034.
(80) Z. Zhou, R. Sitler, Y. Oda, K. Schultz, and G. Quiroz. Quantum Crosstalk Robust Quantum Control, 2023. DOI: 10.1103/PhysRevLett.131.210802.
(81) N. Kanazawa, D. J. Egger, Y. Ben-Haim, H. Zhang, W. E. Shanks, G. Aleksandrowicz, and C. J. Wood. Qiskit experiments: A python package to characterize and calibrate quantum computers. Journal of Open Source Software, 8(84):5329, 2023. DOI: 10.21105/joss.05329.