Provable bounds for noise-free expectation values computed from noisy samples
Abstract
In this paper, we explore the impact of noise on quantum computing, particularly focusing on the challenges when sampling bit strings from noisy quantum computers as well as the implications for optimization and machine learning applications. We formally quantify the sampling overhead to extract good samples from noisy quantum computers and relate it to the layer fidelity, a metric to determine the performance of noisy quantum processors. Further, we show how this allows us to use the Conditional Value at Risk of noisy samples to determine provable bounds on noise-free expectation values. We discuss how to leverage these bounds for different algorithms and demonstrate our findings through experiments on a real quantum computer involving up to 127 qubits. The results show a strong alignment with theoretical predictions.
I Introduction
Quantum computing is a new computational paradigm which promises to impact many disciplines, ranging from quantum chemistry peruzzo_2014_vqe ; ollitrault_2021_dynamics , quantum physics dimeglio2023quantum , and material sciences barkoutsos_2021_alchemical , to machine learning Havlicek2019 ; Zoufal_2019_qgan ; Zoufal_2021_varqbm , optimization farhi_2014_qaoa ; Bravyi2019 ; egger2021warm ; Sack2023 , and finance Woerner_2019_risk ; yndurain_2019_quantum_finance ; Stamatopoulos_2022_market_risk . However, leveraging near-term quantum computers is difficult due to the noise present in the systems. Ultimately, this needs to be addressed by quantum error correction, which exponentially suppresses errors by encoding logical qubits in multiple physical qubits nielsen_and_chuang ; lidar_brun_2013_qec .
In near-term devices, implementing error correction is infeasible. We must find other ways to handle the noise. A promising approach to bridge the gap between noisy and error-corrected quantum computing is error mitigation. Here, we leverage multiple noisy estimates to construct a better approximation of the noise-free result. The most prominent examples are Probabilistic Error Cancellation (PEC) berg2023probabilistic ; Piveteau_2022 and Zero Noise Extrapolation (ZNE) Temme_2017 . While error mitigation in general scales exponentially quek2023exponentially , a combination of PEC and ZNE has been impressively demonstrated recently in a 127-qubit experiment at a circuit depth beyond the reach of exact classical methods kim_2023_utility ; anand2023classical . The rate of the exponential cost of error mitigation directly relates to the errors in the quantum devices. It is expected that these errors can be reduced to a level that noisy devices with error mitigation can already perform practically relevant tasks even before error correction Bravyi_2022 . PEC and ZNE mitigate the errors in expectation values. While this finds many applications, e.g., in quantum chemistry and physics, most quantum optimization farhi_2014_qaoa ; egger2021warm ; zoufal_2023_blackbox and many quantum machine learning algorithms Zoufal_2019_qgan ; letcher2023tight build directly on top of measured samples from a quantum computer. In optimization, having access to an objective value but not the samples corresponds to knowing the value of an optimal solution but not how to realize it. Getting these samples is thus a key problem to scale sample-based algorithms on noisy hardware.
In this paper, we discuss the impact of noise on sampling bit strings from a noisy quantum computer and quantify the sampling overhead required to extract good solutions from noisy devices, e.g., in the context of optimization. Furthermore, we connect our findings to the Conditional Value at Risk (CVaR, also known as Expected Shortfall), an alternative loss function introduced in Ref. barkoutsos_2020_cvar . We show that CVaR is robust against noise and can generate meaningful results from noisy samples also for expectation values. This feature was already conjectured in Ref. barkoutsos_2020_cvar but has not been shown formally. Our work closes this gap and shows that CVaR evaluated on noisy samples achieves provable bounds on noise-free observables. We demonstrate these bounds on up to 127-qubits on a real quantum computer applied to optimization problems, where we find close agreement between the experiments and theory. In particular, this allows us to apply the known noise-free performance bounds for the Quantum Approximate Optimization Algorithm (QAOA) for MAXCUT on 3-regular graphs farhi_2014_qaoa ; wurtz_2021_qaoa . Thus, our work thus results in provable performance guarantees for a variational algorithm even on noisy hardware.
The remainder of this paper is organized as follows. First, Sec. II discusses the impact of noise on sampling and how to quantify it. Then, Sec. III formally defines the CVaR and shows that it can provide provable bounds to noise-free expectation values from noisy samples. Afterwards, Sec. IV discuses the implications of the presented results in the context of applications in optimization, machine learning, and quantum time evolution. Sec. V demonstrates the results on a real quantum computer up to 127-qubits where we find close agreement with the theory. Last, Sec. VI concludes the paper and we discuss open questions for further research.
II Sampling from Noisy Quantum Computers
Suppose an initial -qubit quantum state , a quantum operation , and the resulting . On a real quantum computer, we usually do not have access to the ideal operation but only to a noisy version which we model by . Here, denotes the noise model. We denote the resulting noisy state by .
For simplicity, we assume the Pauli-Lindblad noise model introduced in Ref. berg2023probabilistic
(1) |
Here, denotes the index set for (local) Pauli error terms , and for corresponding model coefficients that determine the strength of the noise. The assumption of Pauli noise can usually be justified via Pauli twirling knill_randomized_2008 ; dankert_exact_2009 ; magesan_scalable_2011 . In Appendix A we discuss Pauli twirling and the assumption of a Pauli noise model in more detail.
In general, a quantum circuit is not a single operation but a concatenation of layers , . Their noisy versions are with corresponding noise models . Crucially, this allows us to learn the noise model for each layer independently berg2023probabilistic . A common assumption is that the layers consist of non-overlap** CNOT gates (or other hardware-native two-qubit Clifford gates) and that these layers are possibly alternating with layers of single qubit gates. Single qubit gates are assumed to be noise-free since their errors are an order of magnitude smaller than those of two-qubit gates. Therefore, only the noise of the two-qubit gate layers is considered.
Assuming the above layer structure and that the noise model of the quantum processor is sparse allows Ref. berg2023probabilistic to introduce a protocol to efficiently learn the model coefficients . A property of that characterizes the overall strength of the noise is . This has a direct operational interpretation, since defines the sampling overhead of applying PEC to mitigate the noise in the context of estimating an expectation value Temme_2017 ; berg2023probabilistic .
Here, we first focus on sampling from noisy quantum computers instead of estimating expectation values. Suppose we prepare a quantum state and afterwards measure the qubits. Then, the probability to sample a bit string is given by for the noise-free state and by for the noisy state . The noise model introduced in Eq. (1) can also be interpreted as follows: with a probability of we sample a bit-string from and with probability we sample from a state where at least one error occurred. Here, we assume such that we can leverage . It immediately follows that , and thus, . Then, the law of total probability kokosaka_2000_probability implies the lower bound:
(2) |
In other words, if a noise-free state has probability to sample a bit string of interest , then, if is approximated by prepared through a noisy process characterized by , we need a multiplicative sampling overhead of to guarantee at least the same probability of sampling as for the noise-free state. Thus, as long as we are only interested in generating relevant bit strings that we can efficiently evaluate classically, we can deal with the noise by measuring -times more often. This is in contrast to the multiplicative sampling overhead introduced by PEC when we are interested in estimating expectation values. Interestingly, if we apply PEC and then determine only the sampling probabilities, without evaluating an expectation value, we find that the sampling probabilities are lower bounded by , i.e., PEC “amplifies” the noise to achieve an unbiased estimation of expectation values, see Appendix B for more details.
The sampling overhead can be derived from the noise model resulting from the noise learning protocol introduced in Ref. berg2023probabilistic . However, in the present context, we are not interested in the full description of the noise model, only in . Recently, Ref. mckay2023benchmarking introduced the Layer Fidelity (LF), a metric to measure noise present in the hardware when executing a circuit. The LF also assumes the layered gate structure mentioned above and determines the resulting fidelity for each layer of gates. It has a direct connection to the sampling overhead via , where characterizes the noise of layer . For multiple layers we can thus rewrite Eq. (2) as
(3) |
Further, the LF has the advantage that it is very cheap to evaluate compared to learning to full noise model. Thus, for a given circuit, the LF allows us to efficiently determine the sampling overhead to compensate the noise.
Other types of errors that we have not mentioned so far are state preparation and measurement (SPAM) errors. In principle, we can also determine a sampling overhead and compensate for the SPAM errors by increasing the number of samples. However, particularly for measurement errors, there exists other protocols which might allow for statistical corrections with a smaller sampling overhead van_den_Berg_2022_trex ; Nation_2021_m3 . A systematic study of these types of errors would be interesting for future research.
III Conditional Value-at-Risk
Section II shows that we can sample bit strings of interest , i.e., corresponding to the noise-free state , by taking -times more samples from the noisy state . However, we usually do not know which samples correspond to the noise-free state and which samples were affected by noise. We now leverage the insight of Sec. II and show that the CVaR can provide provable bounds to noise-free expectation values from noisy samples. The CVaR has already been suggested as a loss function and observable in Ref. barkoutsos_2020_cvar , however, only based on intuition and without theoretical justification.
Consider an integrable real-valued random variable with cumulative distribution function . Then, the (lower) CVaR at level is defined as
where . In the case when , this definition simplifies to , i.e. we are considering the expectation of when we are conditioning to take values in its bottom quantile. Accordingly, we define the upper CVaR as
(4) |
Therefore we are considering the expectation of conditioned on values in its upper quantile. This allows us to prove the following lemma.
Lemma 1.
Suppose a random variable with probabilities for . Further, suppose another random variable as well as a given constant such that . Then we have
(5) |
for all . Thus, the lower and upper CVaR of with define lower and upper bounds, respectively, of the expectation value of .
Proof.
By monotonicity of in , it suffices to show the claim for . Let denote the support of . Take such that , then
Clearly, the minimizing and satisfying for all is also supported on and satisfies
From this, the claim is immediate by using the above to lower bound . The upper bound follows by applying the lower bound to and in place of and . ∎
Next, let us consider again a noise-free -qubit quantum state , its noisy version , and the corresponding . Further, suppose a diagonal Hamiltonian , which can also be interpreted as a function . Let us define the random variables , as the result of measuring and , respectively. Then, Lemma 1 and Eq. (2) immediately imply
(6) |
for all . Since, for a diagonal we have , Eq. (6) implies that the lower/upper CVaR computed from the noisy samples provide lower/upper bounds for the noise-free expectation value of . Further, suppose is the ground state of the diagonal . Then, cannot achieve any values smaller than and the left inequality in Eq. (6) is an equality. Thus, the noisy lower CVaR is equal to the ground state energy (similarly for the upper CVaR if would correspond to the maximally excited state of ). Further, we also know that if the noisy CVaR would equal the ground state energy, the fidelity between the noise-free state and the noisy state is lower bounded by the considered , i.e., .
Diagonal Hamiltonians arise, e.g., in optimization problems or in the form of projectors , as can be used, e.g., for fidelity estimations. We will discuss these applications in more detail in Sec. IV.1 and Sec. IV.2. However, many applications also involve non-diagonal Hamiltonians, most prominently applications in quantum chemistry and physics peruzzo_2014_vqe . Suppose a non-diagonal Hamiltonian , where denote Pauli terms and the corresponding weights. Then, we can decompose into a sum of Hamiltonians consisting of subsets of commuting Pauli strings . All Pauli terms in can be simultaneously diagonalized via single qubit Pauli rotations. Thus, we can assume the are diagonal without loss of generality. We define the corresponding functions as well as noise-free and noisy random variables , respectively, resulting from measuring the quantum states with the corresponding post-rotations to diagonalize the Hamiltonians . This implies
(7) | |||||
for all , which extends the previous result to non-diagonal Hamiltonians. Note that in contrast to diagonal Hamiltonians, we cannot draw conclusions anymore about the groundstate energy or the fidelity between noisy state and groundstate. For instance, the lower bound in Eq. (7) can be strictly smaller then the groundstate energy.
The CVaR can be estimated using Monte Carlo sampling. The variance of this estimator depends on the type of distribution considered but is always bounded by . However, for instance, for Normal and Bernoulli distributions it can even be shown that in the present context the analytic behavior of the variances of CVaR for is , where for Bernoulli, we assume that the success probability satisfies , which is the relevant case for the applications we consider later on, cf. Sec. IV.2. The derivation for the variance bounds for CVaR estimation are provided in Appendix C. Thus, in these cases and for , the variance increases as . This renders the CVaR a very promising noise-robust loss function for variational quantum algorithms. The variance is amplified significantly less than for PEC, where it increases as . However, we need to recall that PEC comes with much stronger theoretical guarantees, i.e., provides an unbiased estimator instead of a bound. Thus, depending on the application, CVaR might not be applicable.
In the remainder of this section we discuss improvements to the lower and upper bounds for cases where we have more information about the noise-free state. I.e, properties that the bit strings measured from the noise-free state must have but that might not persist under noise. Examples of such properties are particle preservation in quantum chemistry Bonet_Monroig_2018_post_selection ; Choquette_2021 and constraints satisfaction in quantum optimization barkoutsos_2020_cvar .
Suppose a function that determines whether a bit string has a required property. Here, indicates the presence of the property. Further, suppose a given Hamiltonian and, for simplicity, let us assume it is diagonal and defined by a function . From this, we can construct a modified Hamiltonian defined by the function
(8) |
where is a given constant. We thus have in the noise-free case for any , since all noise-free samples satisfy . Next, we assume constants and that satisfy for all with . Samples with must be affected by noise, which allows us to filter out samples where the noise destroys the required property. Although there might still be noisy samples that are feasible, the post-selection reduces the impact of noise. Due to the equality of expectation values in the noise-free case and the choice of and , we immediately get
(9) |
for all . This can lead to significantly better bounds since we can leverage the additional information about the considered problem to filter out more noisy samples. For non-diagonal Hamiltonians, see Eq. (7), it is possible to define a filter function for each .
Another implication of our results is that the average over the post-selected noisy samples must lie between the lower and upper bounds resulting from the filtered CVaR due to the monotonicity of CVaR with respect to . Thus, the CVaR allows to bound the bias that post-selection may introduce and provide a quality measure for the estimated expectation value.
IV Applications
We now discuss the presented theory on sampling probabilities and CVaR in the context of different applications: first, quantum optimization farhi_2014_qaoa ; barkoutsos_2020_cvar ; egger2021warm ; zoufal_2023_blackbox ; weidenfeller2022scaling , and second, fidelity-based algorithms, such as Quantum Support Vector Machines (QSVM) Havlicek2019 ; gentinetta2022complexity ; gentinetta2023quantum as well as Variational Quantum Time Evolution (VarQTE) McArdle_2019_varqte ; Yuan_2019_varqte ; Zoufal_2021_varqbm ; Zoufal_2023_varqte_error_bounds ; Gacon_2021_qnspsa ; gacon2023stochastic ; gacon2023variational . These are illustrative examples, the theory presented here is applicable to many other domains, such as quantum chemistry and physics.
IV.1 (Variational) Quantum Optimization
Many variational quantum algorithms have been proposed to solve discrete optimization problems, such as Quadratic Unconstrained Binary Optimization (QUBO). Most of them have a similar structure and interpret every measured bit string as a potential solution to the problem. Proposals that derive variable values from expectation values Bravyi2019 ; fuller2021approximate ; teramoto2023quantumrelaxation ; patti2022variational are, however, not in the focus of our work.
Suppose a generic unconstrained binary optimization problem of the form
(10) |
where is an objective function on binary variables. For instance, a QUBO has with . In case of QUBO, we can apply a change of variables for and replace by the Pauli matrix on qubit and products by to define a diagonal Hamiltonian and translate Eq. (10) into a ground state problem lucas_2014_ising
(11) |
As mentioned in Sec. III, we can transform any generic function to a Hamiltonian where defines the diagonal element of at the position of the computational basis state zoufal_2023_blackbox .
Most variational quantum algorithms for binary optimization are defined via a parameterized ansatz with parameters , a loss function that maps parameter values to a loss value, and an optimizer to solve
(12) |
After the final parameters are determined, the resulting state is measured and the sampled bit strings are used as potential solutions to the problem. Samples obtained during the execution of the algorithm can also be considered as solutions in case they achieve better objective values than the final samples.
If we set for some ansatz , we get the Variational Quantum Eigensolver (VQE) peruzzo_2014_vqe . Further, if we define the ansatz as
(13) |
we get the QAOA farhi_2014_qaoa , where defines the depth, are the variational parameters, and , where denotes the Pauli matrix on qubit .
The results from Sec. II and III immediately apply to QAOA. Suppose we already have a quantum circuit that, when executed and measured in an ideal noise-free setting, produces good solutions to a considered optimization problem. Sec. II immediately implies that when executed on a noisy devices, a sampling overhead of is sufficient to extract solutions of the same quality as in the noise-free case. In certain cases it might be feasible to determine classically streif2019training ; sack2021quantum and only use the quantum computer to sample good solutions, since evaluating (local) expectation values might be easier than sampling from the full circuit begusic023simulating . However, in cases where we must train the parameterized quantum circuit we can replace the expectation value by the CVaR barkoutsos_2020_cvar . The results introduced in Sec. III now provide guidance on how to choose and the required sampling overhead to get good results from a noisy device. We illustrate this on concrete examples in Sec. V.2 and Sec. V.1.
Our results allow us to apply proven performance guarantees for QAOA without noise to noisy hardware. For MAXCUT on 3-regular graphs, QAOA achieves a worst-case performance of for farhi_2014_qaoa , for , and (under certain assumptions) for wurtz_2021_qaoa . With a sampling overhead these guarantees are recovered even in the noisy regime. Furthermore, for 3-regular graphs, we can always train QAOA with classically by simulating at most 30 qubits at a time Sack2023 , i.e., we can determine the optimal parameters via classical simulation and then sample good solutions with a overhead from the quantum computer. Since grows exponentially with the circuit size the sampling overhead introduced to combat noise may exceed the cost of a brute force search. A simple back of the envelope calculation, discussed in Appendix D, determines a minimum layer fidelity require to apply a depth QAOA.
The Quantum Alternating Operator Ansatz (QAOA’) is an alternative of QAOA hadfield_quantum_2019 . Here, a constraint, e.g., a fixed Hamming weight (i.e., a fixed number of ones in a bit string) is enforced by changing the mixer to preserve such states wang2020xymixers ; cook2020vertexcover ; golden2023numerical and starting in (a superposition of) feasible states baertschi2022shortdepth ; baertschi2020grover . Thus, if QAOA’ is executed noise-free, all resulting samples satisfy the given constraint. This is an example of a filter function , as introduced in Sec. III, helps to improve the CVaR bounds on the corresponding expectation value.
IV.2 Fidelities
Several quantum algorithms leverage fidelity estimation between two quantum states in a sub-routine. In the following, we first discuss how to leverage the CVaR bounds to approximate fidelities on noisy quantum computers and then how this impacts two concrete classes of algorithms: QSVMs and VarQTE.
Suppose we have -qubit quantum circuits and that define and , respectively. A common approach to estimate the fidelity between and is the compute-uncompute method given by
(14) |
is thus the probability of measuring for the state . This also equals the expectation value for the state and the diagonal Hamiltonian . Thus, we can use to get an upper bound of the noise-free fidelity. Here, the resulting random variable follows a Bernoulli distribution, as the expectation value counts the number of measured ’s and ignores all other outcomes. Since the variance of the CVaR for a Bernoulli random variable scales with , see Sec. III, we can set and use Eq. (6) to upper bound the fidelity with a sampling overhead of compared to the required by PEC to get an unbiased estimation.
QSVMs leverage a quantum feature map to define a quantum kernel and provably outperform classical computers on certain tasks Liu_2021 . The quantum feature map is a parameterized quantum circuit that takes a classical feature vector as an input to prepare a corresponding quantum state . The corresponding quantum kernel is then defined via the Hilbert-Schmidt inner product of and for two classical data points from some training set, which equals , and thus, falls exactly into the case above.
VarQTE for real or imaginary time evolution assumes a given parametrized quantum state and then projects the exact state evolution to the parameter evolution of the ansatz. This approximates the desired time evolution in the sub-space that the ansatz can represent. The exact projection requires the evaluation of the quantum geometric tensor (QGT) McArdle_2019_varqte ; Yuan_2019_varqte ; Zoufal_2023_varqte_error_bounds . However, that quickly becomes prohibitive as the number of parameters increases. Thus, multiple approximate variants of VarQTE have been proposed that workaround the evaluation of the QGT Gacon_2021_qnspsa ; gacon2023stochastic ; gacon2023variational . Many of these approximations leverage that the Hessian of the fidelity with respect to which is proportional to the QGT of up to higher order terms. They either use Simultaneous Perturbation Stochastic Approximation (SPSA) to estimate the Hessian from evaluations of the fidelity as approximations of the QGT, or they construct alternative loss functions that directly leverage the mentioned fidelity without constructing an approximate QGT. In all variants, the parameter disturbances are small, which implies fidelities close to one. Thus, this is in the regime where the noisy CVaR is very close to the noise-free expectation value, i.e., the sweet spot of the introduced approximation.
V Experiments
Within this section, we analyze two optimization problems from the literature to demonstrate the theory presented in this paper. In both cases, we run QAOA circuits on ibm_sherbrooke ibm_quantum_devices . First, smaller but deeper circuits, and second, larger but more shallow circuits. We always find a nice agreement between the theory and the experimental results. All results within this section are achieved without twirling the circuits. For a comparison and discussion of twirled and untwirled circuits see Appendix A.
ibm_sherbrooke is a 127 qubit superconducting qubit device with an echoed cross-resonance (ECR) gate as two-qubit gate Sheldon2016 . This gate is equivalent to a CNOT gate up to single-qubit gates and has a clear direction on the hardware. We let the transpiler take care of the map** from CNOT gates to ECR gates and will in the following write about CNOT gates for better readability.
V.1 QAOA for MAXCUT on 3-regular graphs with 40 nodes
![Refer to caption](x1.png)
In this section, we examine QAOA for MAXCUT on a random three-regular graph with 40 nodes, i.e., on 40 qubits. We take the problem instance from Ref. Sack2023 and optimize the parameters classically for QAOA with depth and using light-cone simplifications. This allows us to evaluate the required 2-local expectation values by simulating maximally 14 qubits at a time, see details in Ref. Sack2023 . The circuits and optimal parameters are further discussed in Appendix E.
We apply staggered dynamic decoupling for error suppression, as discussed in Appendix F. The circuits are constructed such that they consist of only two different layers of CNOT gates on a line of 40 qubits, denoted by . The first layer is composed of 20 CNOT gates on qubits for even and the second composed of 19 CNOT gates on for odd. Using the technique introduced in Ref. mckay2023benchmarking the measured LF for these two layers is and , respectively 111At the time of writing the experiment to measure layer fidelity is under implementation in Qiskit Experiments QiskitExperiments . See https://github.com/Qiskit-Extensions/qiskit-experiments. We take the geometric average over the total number of CNOT gates and derive a CNOT fidelity as . This also allows us to compute the error per layered gate (EPLG) of Ref. mckay2023benchmarking as . We also define . In total, the circuits for and have 461 and 922 CNOT gates, respectively, all in form of the before mentioned layers. We can thus compute the sampling overhead for and as and , respectively, which corresponds to and , for and , respectively. A regularly measured EPLG evaluated over a chain of 100-qubits is provided for ibm_sherbrooke in the IBM Quantum Platform ibm_quantum_devices . At the time of the experiment the backend reported an EPLG of , which is slightly higher than our measured EPLG. This is expected, since we restrict to 40 qubits. In any case, the EPLG reported by the backend is a good first proxy to estimate the LF and resulting when executing a particular circuit on a device.
global optimum | 56 | |
---|---|---|
30.2 | 29.9 | |
41.5 | 45.3 | |
43.1 | 48.5 | |
best sampled value | 47 | 50 |
number of CNOT gates | 461 | 922 |
1.0290 | ||
To apply the CVaR bounds, we run the circuits for with shots and for with shots. This corresponds to 137 and 19 samples that remain to estimate the CVaR after sorting them and kee** the best and fraction, respectively. The data confirm that provides an upper bound (since MAXCUT is a maximization problem) to the noise-free expectation values, as predicted, see Fig. 1 and Tab. 1. The CVaR upper bound exceeds the noise free value by for and by for .
We also use the noise-free expectation values obtained from the light-cone simulation to calibrate an such that the CVaR matches the noise-free result exactly, denoted by . This allows us to derive an induced effective and compare it to the true . We find that is quite stable for the different and significantly smaller than , see Tab. 1. This may imply that the observable of interest is not affected by all the errors that may occur. Crucially, this observation, may allow us to calibrate for a particular application and choose larger values than implied by the LF, e.g., by running circuits of similar structure but with known noise-free results. This may reduce the sampling overhead in certain scenarios while still achieving good results. However, in general, the lower/upper bounds proven in Sec. III will not hold anymore for .
Comparing the and the best samples with the globally optimal solution, we find that they achieve approximation ratios of (CVaR) and (best sample) for , and (CVaR) and (best sample) for . All these numbers exceed the corresponding theoretical lower bounds of () and () discussed in Sec. IV.1.
V.2 QAOA on Hardware-efficient Higher-Order Ising Model with 127 variables
![Refer to caption](x2.png)
We now show results of running QAOA on higher-order spin glass models. Originally described in Refs. pelofske2023qavsqaoa ; pelofske2023short , these models are designed for a heavy-hex connectivity graph Chamberland_2020 of ibm_sherbrooke.
We define a minimization problem for the following cost Hamiltonian corresponding to a random coefficient spin glass problem with cubic terms and a connectivity graph that is defined to be compatible with an arbitrary heavy-hex lattice graph , see Fig. 2:
(15) |
As is a connected bipartite graph with vertices , it is uniquely bipartitioned as with , where consists of vertices of degree at most . With in (15), we denote the subset of vertices in of degree exactly . Each node in has two neighbors, denoted by and . Thus , , and are the coefficients representing the random selection of the linear, quadratic, and cubic coefficients, respectively. The random coefficients are chosen from with equal probability. An example of such a random higher-order Ising model is in Fig. 2.
We use the qubits in to compute and uncompute parities into, for the and terms in which they are contained. The unitaries and are then realized with -rotations on these parity qubits. Computing and uncomputing parities needs and CNOT gates for the quadratic and cubic terms, respectively; however the CNOT gates for and can be subsumed into the CNOT gates for .
Furthermore, as a bipartite graph of maximum degree 3 admits a 3-edge-coloring due to Kőnig’s line coloring theorem, meaning that these CNOT gates can be scheduled simultaneously for all terms in just non-overlap** layers pelofske2023qavsqaoa . Depth- QAOA circuits for these problems thus have a CNOT depth of only , independent of the system size . Further circuit details are given in Appendix G.
Leveraging parameter transfer of QAOA angles for problems with the same structure but varying numbers of qubits, allows us to obtain good angles for these qubit QAOA circuits for , without on-device variational learning heavy_hex_QAOA_parameter_transfer2023 . Additionally, we utilize converged MPS simulations with a bond dimension of to verify that the fixed QAOA angles produce good expectation values heavy_hex_QAOA_parameter_transfer2023 , for all circuits. The hardware-compatible circuits are run on the ibm_sherbrooke device, again using staggered dynamic decoupling for error suppression, see Appendix F. The optimal solutions of the higher order Ising models were computed using CPLEX cplexv12 ; heavy_hex_QAOA_parameter_transfer2023 .
As before in Sec. V.1, we only have a small number of unique layers of CNOT gates. Since we want to cover a graph of degree three, we need at least three layers, see Appendix G, with 144 CNOT gates in total. The measured LF for the three layers is , and . These fidelities are significantly smaller than for the 40 qubit circuits in Sec. V.1. The reason is that the qubits and gates on a 127-qubit devices are not all the same, there are always some better and some worse. For 40 qubits, we could select the best line of 40 qubits (see Appendix E), while for 127-qubits we had to use the whole chip. From this we can again compute CNOT fidelity , , and . The results for evaluating the circuit on , each with shots, are provided in Fig. 3 and Tab. 2. With the significantly lower fidelities, the number of shots required to apply the analytic CVaR bounds are significantly higher and currently impractical to run. However, like in Sec. V.1, we see that the effective is significantly smaller, even smaller than for the longer 40-qubit circuits. Further, we see that the noisy expectation values are still improving from until and only are starting to get worse for .
Last, we use bootstrap** to confirm the scaling of the CVaR variance with respect to . More precisely, we uniformly sample values from the results collected using ibm_sherbrooke and estimate the CVaR for the five values of reported in Tab. 2. We repeat this times to estimate the variance of the resulting CVaR estimators. The results are provided in Fig. 4 and show close agreement with the theory presented in Sec. III.
![Refer to caption](x3.png)
1 | 288 | -79.79 | -64.54 | -136 | ||||
2 | 576 | -109.35 | -81.11 | -154 | ||||
3 | 864 | -125.37 | -86.97 | -154 | ||||
4 | 1152 | -137.22 | -88.46 | -156 | ||||
5 | 1440 | -145.54 | -85.78 | -164 |
![Refer to caption](x4.png)
VI Conclusion
We examined how hardware noise affects the quality of bit strings sampled from quantum circuits on noisy quantum computers. We proved and demonstrated that the noise can be compensated by increasing the number of samples inversely proportional to the circuit’s layer fidelity, or equivalently, proportional to . This is considerably less than that required for error mitigation strategies like probabilistic error cancellation, which scales as , however, to achieve unbiased estimators of expectation values instead of bounds. Furthermore, we proved that the Conditional Value at Risk provides bounds on noise-free expectation values using noisy samples, providing the theoretical foundation for CVaR as a loss function in variational algorithms, and thus, closing a gap in the literature. We also discussed the potential of this theory to benefit other algorithms, such as Quantum Support Vector Machines or approximate Variational Quantum Time Evolution.
Our primary focus was on errors occurring during circuit execution. However, other error sources, notably State Preparation and Measurement (SPAM) errors, also affect performance on noisy devices. The methodologies developed in this paper can be adapted to account for SPAM errors, either by increasing sampling overhead or applying other mitigation techniques, like statistical readout error mitigation.
The latter may allow to mitigate certain errors without added sampling overhead but might require additional calibration circuits. Investigating the impact of SPAM errors remains an intriguing direction for future research.
Acknowledgments.
The authors want to thank Almudena Carrera Vazquez, Julien Gacon, Youngseok Kim, David McKay, Diego Ristè, David Sutter, Kristan Temme, Minh Tran, and James Wootton for insightful discussions and recommendations to improve the theoretical and experimental results as well as the whole manuscript. Further, M.L. and S.W. acknowledge the support of the Swiss National Science Foundation, SNF grant No. 214919.
E.P., A.B., and S.E acknowledge the support of (i) the Beyond Moore’s Law thrust of the Advanced Simulation and Computing Program (NNSA ASC) at Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC, for the National Nuclear Security Administration of U.S. Department of Energy (Contract No. 89233218CNA000001), and (ii) LANL’s Institutional Computing program. LANL report LA-UR-23-33295.
Appendix A Assumption of Pauli noise
Within the theory of the paper we made the simplifying assumption of Pauli noise. This assumption is not given in general. Suppose a Clifford quantum circuit layer on qubits and its noisy version . A more realistic description of the noise is given by
(16) |
where the are Kraus operators nielsen_and_chuang , which leads to
(17) |
Applying Pauli twirling knill_randomized_2008 ; dankert_exact_2009 ; magesan_scalable_2011 , i.e., averaging over conjugated by each element of the Pauli group on qubits yields
(18) |
for Paulis with for all . This is known to translate the more general noise given in (17) on average to a Pauli noise model as given in (1). In practice, we do not enumerate all Paulis, but uniformly sample from them and apply a certain number of random Paulis to approximate the average.
Suppose now we have a noise model that-on average-looks like Pauli noise. Then, expectation values will have the same value in case of a true Pauli noise model as well as in case of a twirled general model. That also holds if we set , i.e., we evaluate the probability of sampling . However, if we estimate the same sampling probability for the actual Pauli noise model and the twirled noise model the sampling probabilities also must be the same.
For the experiments in Sec. V, we omitted twirling. There are some special cases of noise models where we know the theory holds exactly the same. For instance, suppose stochastic noise wallman2016bounding and all other , for are orthogonal to . Then, it can be easily seen that the probability of having no error is equal to the probability of the Pauli noise resulting after twirling, i.e., equal to . While we can always construct a noise channel with all orthogonal Kraus operators, it is not guaranteed that the identity is part of it. In general, we can only say that the probability of no error in the general noise model is less than or equal to Wallman_2016 ; wallman2016bounding .
However, it seems that the gap between the twirled and untwirled circuits is very small in the considered cases. We demonstrate the this by comparing the twirled and untwirled cases by comparing the resulting distributions. In Fig. 5 we show the experimental distributions when sampling from the ibm_sherbrooke device the same 127-qubits circuits discussed in Sec. V.2. This shows a close agreement with and without twirling.
We note that the observed distributions in Fig. 5 deviate slightly from those presented in Fig. 3. This is because in order to twirl the circuits, we need to insert additional single qubit gates, which contribute to a slightly deeper circuit, here, about 8% longer in the pulse schedule duration than the original circuits. In some cases this could be reduced by combining the twirling gates with other single qubit gates. However, if the additional gates are inserted, e.g., in between two CNOT gates, this is not possible. The circuits for the untwirled case have the same structure as the twirled case, except that the sampled twirling gates are constant, so that there is a fair comparison between the two due to the additional circuit duration.
We also note that the minimum values of the objective functions for the twirled case are lower than the untwirled case. However, since the opposite is true for the mean value of the objective function, we believe this may be due to sampling statistics, as in each of these cases the minimum objective value was only sampled only once. If we determine as before for each case, we find that the twirled and untwirled values agree well for each , and are well within a standard deviation of each other (determined by bootstrap** the observed bitstrings). This is summarized in Tab. 3.
![Refer to caption](x5.png)
p | Twirling | ||||
---|---|---|---|---|---|
1 | No | -79.8 | -60.8 | -128 | 0.147 (7.9%) |
Yes | -60.9 | -144 | 0.152 (7.7%) | ||
2 | No | -109.4 | -74.9 | -144 | 0.0202 (22.4%) |
Yes | -72.9 | -148 | 0.0160 (25.7%) |
Appendix B Probabilistic Error Cancellation & Sampling
In this section, we discuss how applying PEC berg2023probabilistic to quantum circuits affects the resulting sampling probabilities. PEC consists of two steps: learning the noise when running a quantum circuit on a particular quantum device, and then, mitigating the noise to get an unbiased estimator of an expectation value. Here, we assume we have learned the noise already and focus on the error mitigation. Given a noise model , PEC constructs a Quasiprobability Decomposition (QPD) to implement the inverse noise by combining multiple weighted quantum circuits.
In a QPD, a quantum operation is implemented as a linear combination of other (possibly noisy) operations , ,
(19) |
where , , denote (noisy) operations, and . This has first been proposed in the context of error mitigation Temme_2017 , where is assumed to be a noise-free operation and are noisy operations that can be implemented on a noisy device. If this is being applied to multiple gates and qubits, the number of necessary operations explodes exponentially. Thus, instead of enumerating all of them, one rewrites (19) as
(20) |
where , , and , and samples from the probability distribution defined through . Suppose we are interested in estimating for some initial state and observable . Then, we can use the QPD to write
(21) |
Thus, instead of enumerating all circuits, we can sample from , and only evaluate the sampled circuits corresponding to , to get an unbiased estimator for . However, the variance of this estimation is amplified by , i.e., -times more samples are needed than for the original noise-free circuit to achieve an estimate of the same accuracy. The sampling overhead grows exponentially in the number of qubits and depth of the circuit, and thus, can be prohibitively large for circuits beyond a certain circuit size and noise levels.
While PEC has only been considered for the estimation of expectation values, it also generates samples from every random circuit that is measured. However, we will show that this essentially amplifies the noise and increases the sampling overhead compared to the results presented within this paper. To this extent, we introduce the following mixed state introduced by PEC:
(22) |
for some initial state . The state is achieved by drop** the factor as well as the signs from (20). This allows us to state the following lemma.
Lemma 2.
Suppose a -qubit state , where is some unitary, with
(23) |
for a computational basis state , .
Further, suppose that can be error-mitigated on a noisy device by using PEC with corresponding and denote the resulting mixed state introduced in (22) by . Then, the probability of measuring on the noisy devices using PEC is lower bounded by
(24) |
Proof.
Consider the QPD resulting from PEC
(25) |
Using (25) we can write
(26) |
By defining , , and , we can rewrite (26) as
(27) |
Further, allows us to define a random variable that equals if we measure and obtain , where the sign is determined by , and otherwise. The random variable satisfies . We denote the probabilities of taking the values by , respectively. Note that by construction, for each only one of can be larger than zero.
In addition, let the probabilities define a random variable . Then, by the law of total expectation, we get
(28) | ||||
(29) | ||||
(30) |
This can be rewritten as
(31) |
The total probability to measure when applying PEC, independent of the sign of , is then given by
(32) |
where the lower bound follows immediately from (31), and the right-hand-side is exactly the probability of measuring for state . ∎
If we compare the result from Lemma 2 with the lower bound presented in (2), we see that PEC implies the squared overhead compared to direct sampling. Further, this implies that CVaR-based approaches may significantly reduce the overhead to achieve insightful results, particularly when combined with problem structure to filter noisy samples.
Appendix C Variance of Estimating the CVaR
In this section, we present a short exposition on how to estimate CVaR. We will first state the following lemma.
Lemma 3.
Let be i.i.d. copies of (with integrable) and let be their order statistic. For let . Then
If is square integrable and ,
in distribution as where here is the limiting variance.
To estimate , we use the estimator and obtain analogous results.
Proof.
Recall and define . We make the following definitions for (left limits) of empirical cumulative distribution functions:
Also let and . The key observation is that
Indeed, any will appear in the sum defining precisely times; the in the denominator above takes care of overcounting. Now
where
The first equality above follows from the linearity of the expectation and the i.i.d. property of and the second equality follows from conditioning on . Using the strong law of large numbers we have a.s. as . By separately considering the and cases we get
as unless ; however we have so this case does not matter to evaluate the limit of . Thus by dominated convergence
as . The second claim on the central limit theorem is a special case of cvarestimator_clt . ∎
Let us make the following remark on monotonicity: If is non-decreasing and is integrable, then
By applying this to and we see that . Furthermore, by replacing by a random variable sampled from the law of conditioned on for we can deduce that is non-decreasing in . Much more crudely, we can bound .
In the following, we analyze behavior of the limiting distribution of the estimator in some concrete cases.
In the case where has a Bernoulli distribution with success probability , we observe that has the same distribution as where is Binomial distributed with parameter . An application of the central limit theorem thus yields
in distribution as where is a standard normal random variable.
To analyze the case where , it will be useful to recall the following asymptotic expansion (nist_dlmf, , (8.11(i))) of incomplete Gamma functions:
as for any fixed and . In particular as ,
Let and write for the density of . By (nist_dlmf, , (7.17(iii))) we get the asymptotic relationship
as . We will compute and via the cumulant generating function of a truncated Gaussian
Differentiating at yields the expressions
Since , it follows that as ,
As a final example, we can consider the case where has density where (i.e., we consider a power law tail). Here, one can compute that for , which is worse than the decay in the standard normal case and achieves the worst case upper bound on the variance in the limit.
Appendix D Relation to brute-force search
A brute-force search enumerates all candidate solutions and checks which one is optimal. The sampling overhead of on noisy devices can thus be related to brute-force search thereby allowing us to derive a hardware requirements for QAOA. Assuming, for simplicity, that the probability to sample the optimal solution is close to we require hardware with . We can relate this to the layer fidelity to obtain a requirement on hardware quality necessary for potential quantum advantage. First, we assume that each layer in a QAOA circuit has the same layer fidelity . As a result the of the circuit is where is the depth defined as the number of non-overlap** two-qubit gate layers. This assumption is reasonable when transpiling QAOA circuits to a line of qubits which requires layers of CNOT gates applied on every other edge weidenfeller2022scaling . Therefore, the sampling cost to compensate for noise is . For a line of qubits we may assume that to leading order . The factor comes from the fact that layers of SWAP gates are needed to implement full connectivity and each SWAP merged with an is implemented with three CNOT gates. Here, is the number of QAOA layers which is sometimes assumed to grow with the logarithm of problem size, i.e., Bravyi2019 ; weidenfeller2022scaling . If the sampling overhead should stay below brute-force search we therefore require which implies that the layer fidelity must satisfy
(33) |
This requirement is only dependent on problem size through the relation between and . However, as shown in Ref. mckay2023benchmarking the layer fidelity decreases with the number of qubits in the layer. If we further assume that layers are dense, i.e., every layer on qubits consists of approximately CNOT gates, we can compute a corresponding CNOT fidelity as , as well as the corresponding lower bound
(34) |
Appendix E 40-qubit Circuits
The 40-qubit circuits in the main texts are based on those in Ref. Sack2023 . In this work, the authors consider random three-regular graphs transpiled to a line of qubits using a swap network weidenfeller2022scaling . This results in circuits that alternate only two types of layers of CNOT gates as described in the main text. Furthermore, the authors carefully chose the decision variable to physical qubit map** to minimize the number of layers of the swap network. This method is described in Ref. Matsuo2023 . The code to produce such circuits is available on GitHub BestPractices . The optimal parameters resulting from the light-cone optimization are given by for and for , respectively.
Appendix F Dynamical Decoupling
Dynamical decoupling (DD) removes an interaction between a system and a bath by inserting pulses Viola1998 ; Zanardi1999 ; Vitali1999 . Here, we briefly summarize DD following Ref. Ezzell2023 . Consider a time-independent bath interacting with the system though . Here, is an undesired, always-on error term. The goal of DD is to insert pulses in idle times such the time evolution of the system and bath becomes with the desired error-free time-evolution and ideally acts on the bath alone.
Consider a single qubit with . Here, is a coefficient, and is the bath term that couples to the qubit through the Pauli matrix. The simplest DD sequence is where indicates a delay of duration . Since anti-commutes with and , the sequence cancels the and system-bath interactions. The effective error Hamiltonian after a duration is . Here, we see that is not universal since an error remains. Universal decoupling up to first-order is achieved with the sequence
(35) |
which results in the effective error Hamiltonian .
We now consider the two-qubit case. Two fixed-frequency qubits typically exhibit an undesired -coupling which is effectively suppressed with DD Tripathi2022 ; Mundada2023 . Simultaneously applying the sequence on both qubits cancels unwanted errors arising from and . However, since simultaneous pulses commute with the unwanted interactions (i.e. cross-talk, which is common in transmon qubits) are still present. This is remedied with staggered DD. We apply the sequence
(36) |
which staggers two sequences. Here, is an gate applied to qubit , which inverts the evolution of and . In total, the evolution of single-qubit errors changes sign twice and the evolution of errors changes sign four times. In this work we apply the staggered XY4 sequence zhou_quantum_2022 (a variant of the staggered XX sequence presented in Mundada2023 ) to ensure a proper cancellation of two-qubit static cross-talk. The staggered XY4 sequence we employ is defined by . As discussed above, it is universal for single-qubit terms and will cancel the static cross-talk between qubits.
Appendix G 127-qubit QAOA Circuits
![Refer to caption](x6.png)
In Figure 6, taken from Refs. pelofske2023qavsqaoa ; pelofske2023short , we briefly discuss the optimized circuits for the 127-qubit higher-order instances to have a self-contained description. This illustrates that all 2-qubit gates needed for the implementation of can be scheduled in just 3 different layers of non-overlap** CNOT gates. In each QAOA round , each layer is used once to compute and once to uncompute and parity values, for an overall CNOT depth of . The exact values of the heuristically computed, using parameter transfer, QAOA angles that give a strictly increasing expectation value as increases up to are given in Ref. heavy_hex_QAOA_parameter_transfer2023 .
References
- (1) A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien. A variational eigenvalue solver on a photonic quantum processor. Nature Communications, 5(1), 2014. DOI: 10.1038/ncomms5213.
- (2) P. J. Ollitrault, A. Miessen, and I. Tavernelli. Molecular Quantum Dynamics: A Quantum Computing Perspective. Accounts of Chemical Research, 54(23):4229–4238, 2021. DOI: 10.1021/acs.accounts.1c00514. PMID: 34787398.
- (3) A. D. Meglio, K. Jansen, I. Tavernelli, C. Alexandrou, S. Arunachalam, C. W. Bauer, K. Borras, S. Carrazza, A. Crippa, V. Croft, R. de Putter, A. Delgado, V. Dunjko, D. J. Egger, E. Fernandez-Combarro, E. Fuchs, L. Funcke, D. Gonzalez-Cuadra, M. Grossi, J. C. Halimeh, Z. Holmes, S. Kuhn, D. Lacroix, R. Lewis, D. Lucchesi, M. L. Martinez, F. Meloni, A. Mezzacapo, S. Montangero, L. Nagano, V. Radescu, E. R. Ortega, A. Roggero, J. Schuhmacher, J. Seixas, P. Silvi, P. Spentzouris, F. Tacchino, K. Temme, K. Terashi, J. Tura, C. Tuysuz, S. Vallecorsa, U.-J. Wiese, S. Yoo, and J. Zhang. Quantum Computing for High-Energy Physics: State of the Art and Challenges. Summary of the QC4HEP Working Group, 2023. DOI: 10.48550/arXiv.2307.03236.
- (4) P. K. Barkoutsos, F. Gkritsis, P. J. Ollitrault, I. O. Sokolov, S. Woerner, and I. Tavernelli. Quantum algorithm for alchemical optimization in material design. Chemical Science, 12(12):4345–4352, 2021. DOI: 10.1039/D0SC05718E.
- (5) V. Havlicek, A. D. Corcoles, K. Temme, A. W. Harrow, A. Kandala, J. M. Chow, and J. M. Gambetta. Supervised learning with quantum-enhanced feature spaces. Nature, 567:209 – 212, 2019. DOI: 10.1038/s41586-019-0980-2.
- (6) C. Zoufal, A. Lucchi, and S. Woerner. Quantum Generative Adversarial Networks for learning and loading random distributions. npj Quantum Information, 5(1), 2019. DOI: 10.1038/s41534-019-0223-2.
- (7) C. Zoufal, A. Lucchi, and S. Woerner. Variational quantum Boltzmann machines. Quantum Machine Intelligence, 3(1), 2021. DOI: 10.1007/s42484-020-00033-7.
- (8) E. Farhi, J. Goldstone, and S. Gutmann. A Quantum Approximate Optimization Algorithm, 2014. DOI: 10.48550/arXiv.1411.4028.
- (9) S. Bravyi, A. Kliesch, R. Koenig, and E. Tang. Obstacles to Variational Quantum Optimization from Symmetry Protection. Physical Review Letters, 125(26):260505, 2020. DOI: 10.1103/PhysRevLett.125.260505.
- (10) D. J. Egger, J. Mareček, and S. Woerner. Warm-starting quantum optimization. Quantum, 5:479, 2021. DOI: 10.22331/q-2021-06-17-479.
- (11) S. H. Sack and D. J. Egger. Large-scale quantum approximate optimization on non-planar graphs with machine learning noise mitigation, 2023. DOI: 10.48550/arXiv.2307.14427.
- (12) S. Woerner and D. J. Egger. Quantum risk analysis. npj Quantum Information, 5(1), 2019. DOI: 10.1038/s41534-019-0130-6.
- (13) E. Yndurain, S. Woerner, and D. Egger. Exploring quantum computing use cases for financial services, 2019. Available online: https://www.ibm.com/downloads/cas/2YPRZPB3.[dl:21.11.2023].
- (14) N. Stamatopoulos, G. Mazzola, S. Woerner, and W. J. Zeng. Towards quantum advantage in financial market risk using quantum gradient algorithms. Quantum, 6:770, 2022. DOI: 10.22331/q-2022-07-20-770.
- (15) M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information: 10th Anniversary Edition. Cambridge University Press, 2011.
- (16) D. A. Lidar and T. A. Brun. Quantum Error Correction. Cambridge University Press, 2013. DOI: 10.1017/CBO9781139034807.
- (17) E. van den Berg, Z. K. Minev, A. Kandala, and K. Temme. Probabilistic error cancellation with sparse Pauli-Lindblad models on noisy quantum processors. Nature Physics, 19:1116–1121, 2023. DOI: 10.1038/s41567-023-02042-2.
- (18) C. Piveteau, D. Sutter, and S. Woerner. Quasiprobability decompositions with reduced sampling overhead. npj Quantum Information, 8(1), 2022. DOI: 10.1038/s41534-022-00517-3.
- (19) K. Temme, S. Bravyi, and J. M. Gambetta. Error Mitigation for Short-Depth Quantum Circuits. Physical Review Letters, 119(18), 2017. DOI: 10.1103/physrevlett.119.180509.
- (20) Y. Quek, D. S. França, S. Khatri, J. J. Meyer, and J. Eisert. Exponentially tighter bounds on limitations of quantum error mitigation, 2023. DOI: 10.48550/arXiv.2210.11505.
- (21) Y. Kim, A. Eddins, S. Anand, K. X. Wei, E. van den Berg, S. Rosenblatt, H. Nayfeh, Y. Wu, M. Zaletel, K. Temme, and A. Kandala. Evidence for the utility of quantum computing before fault tolerance. Nature, 618:500–505, 2023. DOI: 10.1038/s41586-023-06096-3.
- (22) S. Anand, K. Temme, A. Kandala, and M. Zaletel. Classical benchmarking of zero noise extrapolation beyond the exactly-verifiable regime, 2023. DOI: 10.48550/arXiv.2306.17839.
- (23) S. Bravyi, O. Dial, J. M. Gambetta, D. Gil, and Z. Nazario. The future of quantum computing with superconducting qubits. Journal of Applied Physics, 132(16), 2022. DOI: 10.1063/5.0082975.
- (24) C. Zoufal, R. V. Mishmash, N. Sharma, N. Kumar, A. Sheshadri, A. Deshmukh, N. Ibrahim, J. Gacon, and S. Woerner. Variational quantum algorithm for unconstrained black box binary optimization: Application to feature selection. Quantum, 7:909, 2023. DOI: 10.22331/q-2023-01-26-909.
- (25) A. Letcher, S. Woerner, and C. Zoufal. From Tight Gradient Bounds for Parameterized Quantum Circuits to the Absence of Barren Plateaus in QGANs, 2023. DOI: 10.48550/arXiv.2309.12681.
- (26) P. K. Barkoutsos, G. Nannicini, A. Robert, I. Tavernelli, and S. Woerner. Improving variational quantum optimization using CVaR. Quantum, 4:256, 2020. DOI: 10.22331/q-2020-04-20-256.
- (27) J. Wurtz and P. Love. MaxCut quantum approximate optimization algorithm performance guarantees for . Physical Review A, 103(4):042612, 2021. DOI: 10.1103/PhysRevA.103.042612.
- (28) E. Knill, D. Leibfried, R. Reichle, J. Britton, R. B. Blakestad, J. D. Jost, C. Langer, R. Ozeri, S. Seidelin, and D. J. Wineland. Randomized benchmarking of quantum gates. Physical Review A, 77(1), 2008. DOI: 10.1103/PhysRevA.77.012307. Publisher: American Physical Society.
- (29) C. Dankert, R. Cleve, J. Emerson, and E. Livine. Exact and approximate unitary 2-designs and their application to fidelity estimation. Physical Review A, 80(1):012304, 2009. DOI: 10.1103/PhysRevA.80.012304. Publisher: American Physical Society.
- (30) E. Magesan, J. M. Gambetta, and J. Emerson. Scalable and Robust Randomized Benchmarking of Quantum Processes. Physical Review Letters, 106(18):180504, 2011. DOI: 10.1103/PhysRevLett.106.180504. Publisher: American Physical Society.
- (31) S. Kokosaka and Z. D. CRC Standard Probability and Statistics Tables and Formulae. CRC Press, 2000. DOI: 10.1201/b16923.
- (32) D. C. McKay, I. Hincks, E. J. Pritchett, M. Carroll, L. C. G. Govia, and S. T. Merkel. Benchmarking Quantum Processor Performance at Scale, 2023. DOI: 10.48550/arXiv.2311.05933.
- (33) E. van den Berg, Z. K. Minev, and K. Temme. Model-free readout-error mitigation for quantum expectation values. Physical Review A, 105(3), 2022. DOI: 10.1103/physreva.105.032620.
- (34) P. D. Nation, H. Kang, N. Sundaresan, and J. M. Gambetta. Scalable Mitigation of Measurement Errors on Quantum Computers. PRX Quantum, 2(4), 2021. DOI: 10.1103/prxquantum.2.040326.
- (35) X. Bonet-Monroig, R. Sagastizabal, M. Singh, and T. E. O'Brien. Low-cost error mitigation by symmetry verification. Physical Review A, 98(6), 2018. DOI: 10.1103/physreva.98.062339.
- (36) A. Choquette, A. Di Paolo, P. K. Barkoutsos, D. Sénéchal, I. Tavernelli, and A. Blais. Quantum-optimal-control-inspired ansatz for variational quantum algorithms. Physical Review Research, 3(2), 2021. DOI: 10.1103/physrevresearch.3.023092.
- (37) J. Weidenfeller, L. C. Valor, J. Gacon, C. Tornow, L. Bello, S. Woerner, and D. J. Egger. Scaling of the quantum approximate optimization algorithm on superconducting qubit based hardware. Quantum, 6:870, 2022. DOI: 10.22331/q-2022-12-07-870.
- (38) G. Gentinetta, A. Thomsen, D. Sutter, and S. Woerner. The complexity of quantum support vector machines, 2022. DOI: 10.48550/arXiv.2203.00031.
- (39) G. Gentinetta, D. Sutter, C. Zoufal, B. Fuller, and S. Woerner. Quantum Kernel Alignment with Stochastic Gradient Descent, 2023. DOI: 10.48550/arXiv.2304.09899.
- (40) S. McArdle, T. Jones, S. Endo, Y. Li, S. C. Benjamin, and X. Yuan. Variational ansatz-based quantum simulation of imaginary time evolution. npj Quantum Information, 5(1), 2019. DOI: 10.1038/s41534-019-0187-2.
- (41) X. Yuan, S. Endo, Q. Zhao, Y. Li, and S. C. Benjamin. Theory of variational quantum simulation. Quantum, 3:191, 2019. DOI: 10.22331/q-2019-10-07-191.
- (42) C. Zoufal, D. Sutter, and S. Woerner. Error bounds for variational quantum time evolution. Physical Review Applied, 20(4), 2023. DOI: 10.1103/physrevapplied.20.044059.
- (43) J. Gacon, C. Zoufal, G. Carleo, and S. Woerner. Simultaneous Perturbation Stochastic Approximation of the Quantum Fisher Information. Quantum, 5:567, 2021. DOI: 10.22331/q-2021-10-20-567.
- (44) J. Gacon, C. Zoufal, G. Carleo, and S. Woerner. Stochastic Approximation of Variational Quantum Imaginary Time Evolution, 2023. DOI: 10.48550/arXiv.2305.07059.
- (45) J. Gacon, J. Nys, R. Rossi, S. Woerner, and G. Carleo. Variational quantum time evolution without the quantum geometric tensor, 2023. DOI: 10.48550/arXiv.2303.12839.
- (46) B. Fuller, C. Hadfield, J. R. Glick, T. Imamichi, T. Itoko, R. J. Thompson, Y. Jiao, M. M. Kagele, A. W. Blom-Schieber, R. Raymond, and A. Mezzacapo. Approximate Solutions of Combinatorial Problems via Quantum Relaxations, 2021. DOI: 10.48550/arXiv.2111.03167.
- (47) K. Teramoto, R. Raymond, E. Wakakuwa, and H. Imai. Quantum-Relaxation Based Optimization Algorithms: Theoretical Extensions, 2023. DOI: 10.48550/arXiv.2302.09481.
- (48) T. L. Patti, J. Kossaifi, A. Anandkumar, and S. F. Yelin. Variational quantum optimization with multibasis encodings. Physical Review Research, 4(3):033142, 2022. DOI: 10.1103/PhysRevResearch.4.033142.
- (49) A. Lucas. Ising formulations of many NP problems. Frontiers in Physics, 2, 2014. DOI: 10.3389/fphy.2014.00005.
- (50) M. Streif and M. Leib. Training the quantum approximate optimization algorithm without access to a quantum processing unit. Quantum Science and Technology, 5(3):034008, 2020. DOI: 10.1088/2058-9565/ab8c2b.
- (51) S. H. Sack and M. Serbyn. Quantum annealing initialization of the quantum approximate optimization algorithm. Quantum, 5:491, 2021. DOI: 10.22331/q-2021-07-01-491.
- (52) T. Begušić, K. Hejazi, and G. K.-L. Chan. Simulating quantum circuit expectation values by clifford perturbation theory, 2023. DOI: 10.48550/arXiv.2306.04797.
- (53) S. Hadfield, Z. Wang, B. O’Gorman, E. G. Rieffel, D. Venturelli, and R. Biswas. From the Quantum Approximate Optimization Algorithm to a Quantum Alternating Operator Ansatz. Algorithms, 12(2):34, 2019. DOI: 10.3390/a12020034.
- (54) Z. Wang, N. C. Rubin, J. M. Dominy, and E. G. Rieffel. mixers: Analytical and numerical results for the quantum alternating operator ansatz. Physical Review A, 101(1):012320, 2020. DOI: 10.1103/PhysRevA.101.012320.
- (55) J. Cook, S. Eidenbenz, and A. Bärtschi. The Quantum Alternating Operator Ansatz on Maximum k-Vertex Cover. In IEEE International Conference on Quantum Computing & Engineering QCE’20, pages 83–92, 2020. DOI: 10.1109/QCE49297.2020.00021.
- (56) J. Golden, A. Bärtschi, S. Eidenbenz, and D. O’Malley. Numerical Evidence for Exponential Speed-up of QAOA over Unstructured Search for Approximate Constrained Optimization. In IEEE International Conference on Quantum Computing & Engineering QCE’23, pages 496–505, 2023. DOI: 10.1109/QCE57702.2023.00063.
- (57) A. Bärtschi and S. Eidenbenz. Short-Depth Circuits for Dicke State Preparation. In IEEE International Conference on Quantum Computing & Engineering QCE’22, pages 87–96, 2022. DOI: 10.1109/QCE53715.2022.00027.
- (58) A. Bärtschi and S. Eidenbenz. Grover Mixers for QAOA: Shifting Complexity from Mixer Design to State Preparation. In IEEE International Conference on Quantum Computing & Engineering QCE’20, pages 72–82, 2020. DOI: 10.1109/QCE49297.2020.00020.
- (59) Y. Liu, S. Arunachalam, and K. Temme. A rigorous and robust quantum speed-up in supervised machine learning. Nature Physics, 17(9):1013–1017, 2021. DOI: 10.1038/s41567-021-01287-z.
- (60) IBM Quantum. IBM Quantum Platform - Compute resources. https://quantum-computing.ibm.com/services/resources, 2023. [Online; accessed 20-Nov-2023].
- (61) S. Sheldon, E. Magesan, J. M. Chow, and J. M. Gambetta. Procedure for systematically tuning up cross-talk in the cross-resonance gate. Physical Review A, 93(6):060302, 2016. DOI: 10.1103/PhysRevA.93.060302.
- (62) At the time of writing the experiment to measure layer fidelity is under implementation in Qiskit Experiments QiskitExperiments . See https://github.com/Qiskit-Extensions/qiskit-experiments.
- (63) E. Pelofske, A. Bärtschi, and S. Eidenbenz. Quantum Annealing vs. QAOA: 127 Qubit Higher-Order Ising Problems on NISQ Computers. In International Conference on High Performance Computing ISC HPC’23, pages 240–258, 2023. DOI: 10.1007/978-3-031-32041-5_13.
- (64) E. Pelofske, A. Bärtschi, and S. Eidenbenz. Short-Depth QAOA circuits and Quantum Annealing on Higher-Order Ising Models. npj Quantum Information, 2023. DOI: 10.2172/1985256. Accepted.
- (65) C. Chamberland, G. Zhu, T. J. Yoder, J. B. Hertzberg, and A. W. Cross. Topological and subsystem codes on low-degree graphs with flag qubits. Physical Review X, 10(1), 2020. DOI: 10.1103/physrevx.10.011022.
- (66) E. Pelofske, A. Bärtschi, L. Cincio, J. Golden, and S. Eidenbenz. Scaling Whole-Chip QAOA for Higher-Order Ising Spin Glass Models on Heavy-Hex Graphs, 2023. LANL report LA-UR-23-33192; to appear.
- (67) IBM ILOG CPLEX. V12.10.0: User’s Manual for CPLEX. International Business Machines Corporation, 46(53):157, 2009.
- (68) J. J. Wallman. Bounding experimental quantum error rates relative to fault-tolerant thresholds, 2016. DOI: 10.48550/arXiv.1511.00727.
- (69) J. J. Wallman and J. Emerson. Noise tailoring for scalable quantum computation via randomized compiling. Physical Review A, 94(5), 2016. DOI: 10.1103/physreva.94.052325.
- (70) J. Dedecker and F. Merlevède. Central limit theorem and almost sure results for the empirical estimator of superquantiles/CVaR in the stationary case. Statistics, 56(1):53–72, 2022. DOI: 10.1080/02331888.2022.2043325.
- (71) NIST Digital Library of Mathematical Functions. https://dlmf.nist.gov/, Release 1.1.11 of 2023-09-15. Available online: https://dlmf.nist.gov/. F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, B. V. Saunders, H. S. Cohl, and M. A. McClain, eds.
- (72) A. Matsuo, S. Yamashita, and D. J. Egger. A SAT Approach to the Initial Map** Problem in SWAP Gate Insertion for Commuting Gates. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E106.A(11):1424–1431, 2023. DOI: 10.1587/transfun.2022eap1159.
- (73) Best practices in quantum optimization. Available online: https://github.com/qiskit-community/qopt-best-practices.
- (74) L. Viola and S. Lloyd. Dynamical suppression of decoherence in two-state quantum systems. Phyical Review A, 58(4):2733, 1998. DOI: 10.1103/PhysRevA.58.2733.
- (75) P. Zanardi. Symmetrizing evolutions. Physics Letters A, 258(2–3):77–82, 1999. DOI: 10.1016/S0375-9601(99)00365-5.
- (76) D. Vitali and P. Tombesi. Using parity kicks for decoherence control. Phyical Review A, 59(6):4178, 1999. DOI: 10.1103/PhysRevA.59.4178.
- (77) N. Ezzell, B. Pokharel, L. Tewala, G. Quiroz, and D. A. Lidar. Dynamical decoupling for superconducting qubits: a performance survey, 2023. DOI: 10.48550/arXiv.2207.03670.
- (78) V. Tripathi, H. Chen, M. Khezri, K.-W. Yip, E. Levenson-Falk, and D. A. Lidar. Suppression of Crosstalk in Superconducting Qubits Using Dynamical Decoupling. Physical Review Appl., 18(2):024068, 2022. DOI: 10.1103/PhysRevApplied.18.024068.
- (79) P. S. Mundada, A. Barbosa, S. Maity, Y. Wang, T. Merkh, T. Stace, F. Nielson, A. R. Carvalho, M. Hush, M. J. Biercuk, and Y. Baum. Experimental Benchmarking of an Automated Deterministic Error-Suppression Workflow for Quantum Algorithms. Physical Review Applied, 20(2):024034, 2023. DOI: 10.1103/PhysRevApplied.20.024034.
- (80) Z. Zhou, R. Sitler, Y. Oda, K. Schultz, and G. Quiroz. Quantum Crosstalk Robust Quantum Control, 2023. DOI: 10.1103/PhysRevLett.131.210802.
- (81) N. Kanazawa, D. J. Egger, Y. Ben-Haim, H. Zhang, W. E. Shanks, G. Aleksandrowicz, and C. J. Wood. Qiskit experiments: A python package to characterize and calibrate quantum computers. Journal of Open Source Software, 8(84):5329, 2023. DOI: 10.21105/joss.05329.