License: arXiv.org perpetual non-exclusive license
arXiv:2312.15254v1 [quant-ph] 23 Dec 2023

Ecmas: Efficient Circuit Map** and Scheduling for Surface Code

Mingzheng Zhu2, Hao Fu2, Jun Wu2, Chi Zhang2, Wei Xie2, Xiang-Yang Li23 2 University of Science and Technology of China, China
3 Hefei National Laboratory, University of Science and Technology of China, Hefei 230088, China
Abstract

As the leading candidate of quantum error correction codes, surface code suffers from significant overhead, such as execution time. Reducing the circuit’s execution time not only enhances its execution efficiency but also improves fidelity. However, finding the shortest execution time is NP-hard.

In this work, we study the surface code map** and scheduling problem. To reduce the execution time of a quantum circuit, we first introduce two novel metrics: Circuit Parallelism Degree and Chip Communication Capacity to quantitatively characterize quantum circuits and chips. Then, we propose a resource-adaptive map** and scheduling method, named Ecmas, with customized initialization of chip resources for each circuit. Ecmas can dramatically reduce the execution time in both double defect and lattice surgery models. Furthermore, we provide an additional version Ecmas-ReSu for sufficient qubits, which is performance-guaranteed and more efficient. Extensive numerical tests on practical datasets show that Ecmas outperforms the state-of-the-art methods by reducing the execution time by 51.5% on average for double defect model. Ecmas can reach the optimal result in most benchmarks, reducing the execution time by up to 13.9% for lattice surgery model.

Index Terms:
Surface Code, Compilation, Execution Time

I Introduction

Quantum algorithms offer exponential speedup over classical algorithms in various fields such as machine learning [18, 30, 14], simulation [11] and cryptography [31]. One of the obstacles to achieving such advantages is the inevitable errors of quantum hardware. The error rate of the state-of-the-art superconducting quantum devices is around 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT per operation [37, 2, 13], which falls far short of meeting the demands of practical applications [12]. One approach to handle these errors is quantum error correction (QEC), which establishes the fault-tolerant computational framework [32]. Surface code [5, 7, 22] currently stands as the most promising QEC code, highlighting a threshold error rate of up to 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT. Its natural 2-D nearest-neighbor structure makes it well-suited for implementation on superconducting chips.

Applying surface code to protect a quantum circuit involves converting the circuit into an encoded form. Unlike the circuit transformation typical in the NISQ (Noisy Intermediate-Scale Quantum) era, surface code transformation necessitates map** a single logical qubit to a cluster of physical qubits, known as tiles[19]. As a result, the conditions for executing logical operations differ significantly. For instance, a CNOT gate no longer requires physical qubits to be adjacent on the chip. Instead, it can be implemented by establishing a non-intersecting path between two distinct tiles, called qubit communication. These requirements call for develo** an efficient and specialized transformer to transform a quantum circuit into a surface-code-encoded circuit.

The transformation process has two stages: initialization and scheduling. In the initialization stage, the transformer needs to allocate tiles for each logical qubit and allocate channels for communication. In the scheduling stage, the transformer determines the specific execution schemes for each operation. Most operations can be performed within the tiles [10], except T gate and CNOT gate, which are the most resource-consuming logical operations. The substantial overhead of T gates stems from their inability to be fault-tolerantly executed, thereby necessitating the use of supplementary magic state distillation circuits [4]. Through extensive research efforts [8, 26], this overhead has been considerably reduced. However, the time delay induced by CNOT gates is severe, particularly in circuits where quite a lot of CNOT gates can be executed in parallel, such as Ising circuits [20, 29] and the QDNN circuits [34]. This significantly influences the fidelity of the execution result of the surface code circuit. With the same physical error rates and the code distance, a shorter execution time yields higher result fidelity. Thus, an essential goal of transformation is to reduce the execution time of the circuit. However, finding the shortest execution time of a circuit is NP-hard (as proved in Theorem 1), which makes it a non-trivial task for finding an effective result efficiently.

Executing CNOT gates can be simplified as constructing a path between the two involved tiles, regardless of the specific encoding scheme, i.e., double defect model [10] or lattice surgery model [3]. Logical qubits are represented as small boxes in Fig. 1, and channels are the residual regions used to establish the paths (depicted by the lines). CNOT gates can be completed within one clock cycle, regardless of the path length. Simultaneous execution of CNOT gates requires non-intersect corresponding paths.

Many works focus on the paths of CNOT gates to reduce the latency caused by path conflicts [19, 17, 3]. Braidflash [19] reduces the latency of CNOT gates on the critical path by assigning priority to CNOT gates to reduce the delay of the conflicts. AutoBraid [17] identifies specific patterns to find non-intersecting paths and EDPCI [3] draws inspiration from the concept of edge-disjoint paths. However, they have all overlooked a crucial aspect: within the context of surface code, the communication resources on the chip are software-defined. We use bandwidth to represent the width of the channel, with which we can adaptively adjust communication resources for varying circuits.

Refer to caption
Figure 1: Motivation example: a logical quantum circuit and its corresponding surface code encoded circuit.

Motivation Example: As shown in Fig. 1, five independent gates are ready to be executed. Each gate’s path requires a certain width of physical qubits during execution. Due to the lack of quantitative analysis on the channels, the path occupies the entire channel in the scheduling process of previous works. As a result, no five disjoint paths allow these gates to be executed simultaneously. However, in the optimal case, one clock cycle is enough to execute the circuit with the same chip and tile placement by better allocating the channel resource.

In this paper, we study the surface code circuit map** and scheduling problem. We propose resource-adaptive transforming methods Ecmas which reduce the execution time of circuits on target chips by customizing channel resources for each circuit. The main idea is to characterize the circuit and the chip and then schedule communication resources based on the circuit’s requirements. Our contributions are summarized as follows:

  • We formulate the surface code circuit map** and scheduling problem and analyze the computational complexity of double defect model.

  • We define Circuit Parallelism Degree and Chip Communication Capacity to quantitatively characterize circuits and chips. Further, we introduce bandwidth to customizing channel resources for each quantum circuit.

  • We propose resource-adaptive map** and scheduling methods that can be applied to both double defect and lattice surgery models. With sufficient physical qubits, Ecmas offers Ecmas-ReSu which can provide performance-guaranteed results efficiently.

  • We evaluate Ecmas for circuits from IBM Qiskit [29], QASMbench [24], etc. With the same chip resource configuration, Ecmas eliminates 51.5% of the execution time on average and 67.3% at most compared with Autobraid [17] for double defect model. Ecmas could find the optimal result for most benchmarks for lattice surgery model. Compared with EDPCI [3], Ecmas can achieve optimizations up to 13.9%.

The rest of this paper is organized as follows. We introduce the background in Section II and then formulate the surface code map** and scheduling problem in Section III. We describe our methods in Section IV and evaluate their performance in Section V. Related works and conclusion are given in Section VI and Section VII.

II Background

In this section, we present a brief overview of quantum error correction and surface code model.

II-A Quantum Error Correction

Quantum programs can be described by the quantum circuit model, which consists of a sequence of quantum gates performed upon a collection of qubits. Qubits are the fundamental units in quantum computing which can be represented by a normalized vector. Quantum gates are unitary operations that operate on qubits.

However, quantum computing suffers from the inevitable noise of interactions with the environment and imprecise operations. QEC codes are necessary to build fault-tolerant quantum computing. It encodes a logical qubit with multiple physical qubits, improving reliability. The noise of the quantum system appears not only in the communication process but also in the computation process. Therefore, the quantum circuit must run under the protection of QEC. QEC codes should detect and correct errors periodically during the execution.

II-B Surface Code

Among various QEC codes, surface code is a prominent candidate for achieving fault-tolerant quantum computation in superconducting implementations. It has a high threshold of around 1% and alignment with 2D architectures, making it a feasible error correction code for practical demonstrations on real machines [1, 23, 39].

As shown in Fig. 2, surface code is realized on a 2D lattice of physical qubits, including data and measurement qubits. Data qubits store quantum states, while measurement qubits identify error occurrences. Based on the measurement circuit, measurement qubits are categorized as X-stabilizers and Z-stabilizers. During the execution, measurement circuits are periodically executed to detect the errors. The time for executing one measurement circuit is called a surface code cycle. Surface code can be classified as double defect [10] and lattice surgery [3] based on the different approaches to creating logical qubits.

Refer to caption
\thesubsubfigure
Refer to caption
\thesubsubfigure
Refer to caption
\thesubsubfigure
Figure 2: (a) Surface code implementation on 2-D lattice, the white circles are data qubits and the gray circles are measurement qubits, (b) X-cut tile, (c) Z-cut tile.

II-B1 Double Defect Model

In double defect model, a logical qubit is created by turning off two defects of the same type. According to the type of defects, the logical qubit is initialized into X-cut or Z-cut, as shown in Fig. 2 and Fig. 2. Code distance d𝑑ditalic_d determines the number of errors that surface code can detect and correct. All single-qubit gates can be executed in software or locally, only involving physical qubits around its two defects under the assumption in [19] that a steady supply of magic state qubits is at the location of the data. We denoted these physical qubits as tiles and the rest of the qubits on the chip as channels.

CNOT gate requires communication between the control and target qubit, achieved by performing braiding operations in the channels. A braiding operation turns off the involved measurement qubits on the braiding path. It follows the topological rules: the braiding paths are equivalent as long as the starting and ending tiles are the same. Braiding operations of any length can be executed within 2d2𝑑2d2 italic_d surface code cycles, equivalent to one clock cycle. The braiding operation can only be performed between logical qubits with different cut types. In practice, each tile contains two double-defect logical qubits, one for computation and one for ancilla. There are two ways to perform a CNOT gate with qubits of the same cut type. One is to use three braiding operations with ancilla qubit, as shown in Fig. 2(a). The other is to modify the cut type of the tile and then perform the braiding operation, as shown in Fig. 2(b). They require three cycles and four cycles, respectively.

Refer to caption
(a) Direct implementation
Refer to caption
(b) Modifying implementation
Figure 3: CNOT gate between logical qubits of same cut type: (a) three braiding operations without cut type changing, (b) changing the cut type and executing the CNOT gate.

II-B2 Lattice Surgery Model

Lattice surgery eliminates the holes within tiles and uses the rotated surface code (as shown in Fig. 5) to reduce the requirement of qubit resources for surface code with the same code distance. CNOT gates in lattice surgery are attained by conducting ZZ measurements between neighboring tiles. A straightforward approach for a CNOT gate at a distance k𝑘kitalic_k involves continuously swap** logical qubits until they are adjacent, requiring a minimum of k×d𝑘𝑑k\times ditalic_k × italic_d surface code cycles to complete. Another method involves constructing Bell states using ancilla qubits for execution, achievable within 2×d2𝑑2\times d2 × italic_d surface code cycles i.e. one cycle as shown in Fig. 4.

Refer to caption
Figure 4: The CNOT gate implementation in lattice surgery model by constructing Bell states.
Refer to caption
\thesubsubfigure
Refer to caption
\thesubsubfigure
Figure 5: Simplified tile models: (a) double defect model, (b) lattice surgery model.

III System Model and Problem Formulation

In this section, we formally define the surface code map** and scheduling problem for both double defect and lattice surgery models and demonstrate the complexity of the problem under double defect model.

Quantum circuit: We consider an input quantum circuit P𝑃Pitalic_P with n𝑛nitalic_n logical qubits (Fig. 6). Since single-qubit gates can be implemented by software or locally in tile, we only consider CNOT gates in this work. Generally, P𝑃Pitalic_P can be represented as a directed acyclic graph (DAG) GPsubscript𝐺𝑃G_{P}italic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT, as shown in Fig. 6. In GPsubscript𝐺𝑃G_{P}italic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT, each node is a CNOT gate, and edges indicate the dependency between gates. The critical path length of GPsubscript𝐺𝑃G_{P}italic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT is the circuit depth, denoted as α𝛼\alphaitalic_α. The communication graph GCsubscript𝐺𝐶G_{C}italic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT is another representation of a quantum circuit, as shown in Fig. 6, where each vertex is a logical qubit, and edges indicate CNOT gates between the qubits, and the weight of the edge is the number of the corresponding CNOT gates.

Refer to caption
\thesubsubfigure
Refer to caption
\thesubsubfigure
Refer to caption
\thesubsubfigure
Figure 6: Three representations of quantum circuit: (a) original, (b) DAG, (c) communication graph.

Quantum chip: We assume that the topology of a quantum chip is the 2D lattice of the physical qubit, where each qubit is associated with four adjacent qubits, except the qubit on the chip’s boundary. We use Lm1×m2subscript𝐿subscript𝑚1subscript𝑚2L_{m_{1}\times m_{2}}italic_L start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT to denote the 2D chip with m1subscript𝑚1m_{1}italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT rows and m2subscript𝑚2m_{2}italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT columns of physical qubits.

Surface Code Encoded Circuit: The encoded circuits PSsuperscript𝑃𝑆P^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT should satisfy the following two constraints. First, the execution scheme should be equivalent to the logical circuit, i.e., all gates are scheduled, and the scheduling order is consistent with the topological sort of gates in GPsubscript𝐺𝑃G_{P}italic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT. Second, the CNOT paths of the gates executed in one cycle do not intersect. The execution time of a circuit is Δ×2d×τΔ2𝑑𝜏\Delta\times 2d\times\tauroman_Δ × 2 italic_d × italic_τ, where ΔΔ\Deltaroman_Δ is the cycle number and τ𝜏\tauitalic_τ is the execution time of each surface code cycle. Since d𝑑ditalic_d and τ𝜏\tauitalic_τ have the same effect on different map** and scheduling methods, we simplify the execution time as cycle number ΔΔ\Deltaroman_Δ.

Surface Code map** and scheduling Problem: Given an input quantum circuit P𝑃Pitalic_P, a specific quantum chip Lm1×m2subscript𝐿subscript𝑚1subscript𝑚2L_{m_{1}\times m_{2}}italic_L start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and the required code distance d𝑑ditalic_d find an initial map** and the execution scheme that satisfy the surface code circuit constraints with the cycle number of circuit ΔPSsubscriptΔsuperscript𝑃𝑆\Delta_{P^{S}}roman_Δ start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT end_POSTSUBSCRIPT be minimized.

Next, we will refine the models for double defect and lattice surgery approaches. We further provide detailed descriptions of each model and offer formal problem definitions.

III-A Double Defect Model

Tile: A tile is a square array of 5d×5d5𝑑5𝑑5d\times 5d5 italic_d × 5 italic_d physical qubits, as shown in Fig. 5, each tile contains two logical qubits, one for map** logical qubits and one for ancilla. We use (Ta,b,Cuti)subscript𝑇𝑎𝑏𝐶𝑢subscript𝑡𝑖(T_{a,b},Cut_{i})( italic_T start_POSTSUBSCRIPT italic_a , italic_b end_POSTSUBSCRIPT , italic_C italic_u italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) to denote tile Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where (a,b)𝑎𝑏(a,b)( italic_a , italic_b ) is the position of the upper left corner of the tile and Cuti𝐶𝑢subscript𝑡𝑖Cut_{i}italic_C italic_u italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is its cut type.

Channel: Channel is used to perform braiding operations. Each braiding path requires a width of 2.5d2.5𝑑2.5d2.5 italic_d physical qubits. We consider the occupation of a braiding path within a channel as a lane. We introduce bandwidth to characterize the number of lanes in each channel. The bandwidth of a channel Cisubscript𝐶𝑖C_{i}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is Wi2.5dsubscript𝑊𝑖2.5𝑑\lfloor\frac{W_{i}}{2.5d}\rfloor⌊ divide start_ARG italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 2.5 italic_d end_ARG ⌋, where Wisubscript𝑊𝑖W_{i}italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the number of physical qubits in the width of the channel Cisubscript𝐶𝑖C_{i}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Then, we consider the minimum bandwidth of channels within the chip as the chip’s bandwidth.

Fig. 7 illustrates the process of surface code transformation for the double defect model. We present the formal description as follows.

Refer to caption
Figure 7: A five-step execution scheme for the quantum circuit in Fig.6 using the double defect model, where the gray boxes are for X-cut tiles and the white boxes are for Z-cut tiles.
Problem 1

Initialization Problem for Double Defect.
Input: An input logical circuit Pnormal-PPitalic_P, a 2D lattice chip Lm1×m2subscriptnormal-Lsubscriptnormal-m1subscriptnormal-m2L_{m_{1}\times m_{2}}italic_L start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, the required code distance dnormal-dditalic_d and a natural number knormal-kkitalic_k.
Output: Whether there is an initial tile map** Tmapping={qi(Ta,b,Cuti)}subscriptnormal-Tnormal-mnormal-anormal-pnormal-pnormal-inormal-nnormal-gnormal-→subscriptnormal-qnormal-isubscriptnormal-Tnormal-anormal-bnormal-Cnormal-usubscriptnormal-tnormal-iT_{map**}=\{q_{i}\rightarrow(T_{a,b},Cut_{i})\}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_p italic_p italic_i italic_n italic_g end_POSTSUBSCRIPT = { italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → ( italic_T start_POSTSUBSCRIPT italic_a , italic_b end_POSTSUBSCRIPT , italic_C italic_u italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } such that the number of cycles of the optimal surface code encoded circuit POPTSsubscriptsuperscriptnormal-Pnormal-Snormal-Onormal-Pnormal-TP^{S}_{OPT}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_O italic_P italic_T end_POSTSUBSCRIPT is upper bounded by α+knormal-αnormal-k\alpha+kitalic_α + italic_k, namely, ΔPOPTS<α+ksuperscriptnormal-Δsubscriptsuperscriptnormal-Pnormal-Snormal-Onormal-Pnormal-Tnormal-αnormal-k\Delta^{P^{S}_{OPT}}<\alpha+kroman_Δ start_POSTSUPERSCRIPT italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_O italic_P italic_T end_POSTSUBSCRIPT end_POSTSUPERSCRIPT < italic_α + italic_k.

Problem 2

Scheduling Problem for Double Defect.
Input: An input logical circuit Pnormal-PPitalic_P, a 2D lattice chip Lm1×m2subscriptnormal-Lnormal-m1normal-m2L_{m1\times m2}italic_L start_POSTSUBSCRIPT italic_m 1 × italic_m 2 end_POSTSUBSCRIPT and an initial tile map** Tmappingsubscriptnormal-Tnormal-mnormal-anormal-pnormal-pnormal-inormal-nnormal-gT_{map**}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_p italic_p italic_i italic_n italic_g end_POSTSUBSCRIPT.
Output: A surface code encoded circuit PSsuperscriptnormal-Pnormal-SP^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT with its number of cycles ΔPSsuperscriptnormal-Δsuperscriptnormal-Pnormal-S\Delta^{P^{S}}roman_Δ start_POSTSUPERSCRIPT italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT minimized.

Hardness:

Theorem 1

The surface code tile initialization problem for double defect model is NP-hard.

Proof sketch:

We reduce the Initialization Problem for double defect model into a 3-SAT problem. For more details, please refer to Appendix A.

III-B Lattice Surgery Model

Tile: As shown in Fig. 5, each small box represents a tile that can be mapped as a logical qubit, with 2d×2d2𝑑2𝑑\lceil\sqrt{2}d\rceil\times\lceil\sqrt{2}d\rceil⌈ square-root start_ARG 2 end_ARG italic_d ⌉ × ⌈ square-root start_ARG 2 end_ARG italic_d ⌉ physical qubits. We denote tile Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by Ta,bsubscript𝑇𝑎𝑏T_{a,b}italic_T start_POSTSUBSCRIPT italic_a , italic_b end_POSTSUBSCRIPT, where (a,b)𝑎𝑏(a,b)( italic_a , italic_b ) represents the upper left position of this tile.

Channel: Each channel is composed of tiles, which are ancilla logical qubits to generate Bell states for communication. Since both logical qubits and channels are constructed from tiles, the width of a path and a tile are the same, consisting of d𝑑ditalic_d physical qubits. The bandwidth of the channels Cisubscript𝐶𝑖C_{i}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is given by Wi2dsubscript𝑊𝑖2𝑑\lfloor\frac{W_{i}}{\lceil\sqrt{2}d\rceil}\rfloor⌊ divide start_ARG italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ⌈ square-root start_ARG 2 end_ARG italic_d ⌉ end_ARG ⌋. The chip’s bandwidth is the minimal bandwidth of its channels.

Fig. 8 shows the process of surface code transformation for the lattice surgery model. Below, we present the formal depiction of these problems.

Refer to caption
Figure 8: A three-step execution scheme for the quantum circuit in Fig. 6 using the lattice surgery model.
Problem 3

Initialization Problem for Lattice Surgery
Input: An input logical circuit Pnormal-PPitalic_P, a 2D lattice chip Lm1×m2subscriptnormal-Lsubscriptnormal-m1subscriptnormal-m2L_{m_{1}\times m_{2}}italic_L start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, the required code distance dnormal-dditalic_d and a natural number knormal-kkitalic_k.
Output: Whether there is an initial tile map** Tmapping={qiTa,b}subscriptnormal-Tnormal-mnormal-anormal-pnormal-pnormal-inormal-nnormal-gnormal-→subscriptnormal-qnormal-isubscriptnormal-Tnormal-anormal-bT_{map**}=\{q_{i}\rightarrow T_{a,b}\}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_p italic_p italic_i italic_n italic_g end_POSTSUBSCRIPT = { italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_T start_POSTSUBSCRIPT italic_a , italic_b end_POSTSUBSCRIPT } such that ΔPOPTS<α+ksuperscriptnormal-Δsubscriptsuperscriptnormal-Pnormal-Snormal-Onormal-Pnormal-Tnormal-αnormal-k\Delta^{P^{S}_{OPT}}<\alpha+kroman_Δ start_POSTSUPERSCRIPT italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_O italic_P italic_T end_POSTSUBSCRIPT end_POSTSUPERSCRIPT < italic_α + italic_k, namely, the number of cycles of the optimal surface code encoded circuit POPTSsubscriptsuperscriptnormal-Pnormal-Snormal-Onormal-Pnormal-TP^{S}_{OPT}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_O italic_P italic_T end_POSTSUBSCRIPT is upper bounded by α+knormal-αnormal-k\alpha+kitalic_α + italic_k.

Problem 4

Scheduling Problem for Lattice Surgery.
Input: An input logical circuit Pnormal-PPitalic_P, a 2D lattice chip Lm1×m2subscriptnormal-Lsubscriptnormal-m1subscriptnormal-m2L_{m_{1}\times m_{2}}italic_L start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and an initial tile map** Tmappingsubscriptnormal-Tnormal-mnormal-anormal-pnormal-pnormal-inormal-nnormal-gT_{map**}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_p italic_p italic_i italic_n italic_g end_POSTSUBSCRIPT.
Output: A surface code encoded circuit PSsuperscriptnormal-Pnormal-SP^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT with its number of cycles ΔPSsuperscriptnormal-Δsuperscriptnormal-Pnormal-S\Delta^{P^{S}}roman_Δ start_POSTSUPERSCRIPT italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT minimized.

Hardness: Herr et al.[15] demonstrated that the complexity of surface code map** and transforming problem is NP-complete for lattice surgery model.

IV System Design

It is a non-trivial task to optimize circuit map** and scheduling on limited qubit resources. To address the problem, firstly, we introduce two novel metrics: Circuit Parallelism Degree and Chip Communication Capacity (Section. IV-A) to characterize quantum circuits and chips. Then, we propose resource-adaptive algorithms Ecmas (Section. IV-B) with customized initialization of chip resources for each circuit. Further, with sufficient physical qubits on the chip, Ecmas-ReSu  can have a shorter transforming time and performance-guaranteed result. An overview of our comprehensive toolflow is shown in Fig. 9.

Refer to caption
Figure 9: Overview of Ecmas.

IV-A Pre-processing

IV-A1 Quantum Circuit Profiling

Different quantum circuits may have various demands on communication resources. We introduce Circuit Parallelism Degree (denoted as 𝕄𝕄\mathbb{PM}blackboard_P blackboard_M) to characterize the maximum demand of communication resources of a circuit.

Definition 1

Circuit Parallelism Degree: Given a quantum circuit P𝑃Pitalic_P, GP=(V,E)subscript𝐺𝑃𝑉𝐸G_{P}=(V,E)italic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT = ( italic_V , italic_E ). A partition π𝜋\piitalic_π is to divide nodes vV𝑣𝑉v\in Vitalic_v ∈ italic_V into ΔGPsubscriptnormal-Δsubscript𝐺𝑃\Delta_{G_{P}}roman_Δ start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_POSTSUBSCRIPT disjoint set V1,V2,,VΔGPsubscript𝑉1subscript𝑉2normal-…subscript𝑉subscriptnormal-Δsubscript𝐺𝑃V_{1},V_{2},...,V_{\Delta_{G_{P}}}italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_V start_POSTSUBSCRIPT roman_Δ start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT, such that for uVi𝑢subscript𝑉𝑖u\in V_{i}italic_u ∈ italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vVj𝑣subscript𝑉𝑗v\in V_{j}italic_v ∈ italic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, if (u,v)E𝑢𝑣𝐸(u,v)\in E( italic_u , italic_v ) ∈ italic_E, then i>j𝑖𝑗i>jitalic_i > italic_j. 𝕄=minπmaxi=1ΔGP|Vi|𝕄subscript𝜋superscriptsubscript𝑖1subscriptnormal-Δsubscript𝐺𝑃subscript𝑉𝑖\mathbb{PM}=\min_{\pi}\max_{i=1}^{\Delta_{G_{P}}}|V_{i}|blackboard_P blackboard_M = roman_min start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |

Finding 𝕄𝕄\mathbb{PM}blackboard_P blackboard_M is equivalent to given n𝑛nitalic_n tasks and their precedence constraints, minimizing the number of machines used while the whole schedule is of minimum length. Finke [9] has proved that this is NP-complete.

We propose a heuristic algorithm (Algorithm Para-Finding) to find the circuit’s estimate Circuit Parallelism Degree 𝕄~~𝕄\widetilde{\mathbb{PM}}over~ start_ARG blackboard_P blackboard_M end_ARG and the corresponding execution order. Our methods use layers to keep track of the execution order of gates, where layer1𝑙𝑎𝑦𝑒subscript𝑟1layer_{1}italic_l italic_a italic_y italic_e italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT represents the operations to be performed in the first time cycle. For any gate i𝑖iitalic_i, we record two values, the highest and lowest layers, that the gate can be scheduled. Layers are determined by the gate’s parents and children nodes, denoted as parenti𝑝𝑎𝑟𝑒𝑛subscript𝑡𝑖parent_{i}italic_p italic_a italic_r italic_e italic_n italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and childi𝑐𝑖𝑙subscript𝑑𝑖child_{i}italic_c italic_h italic_i italic_l italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Lowi=maxjparentiLowj+1𝐿𝑜subscript𝑤𝑖subscript𝑗𝑝𝑎𝑟𝑒𝑛subscript𝑡𝑖𝐿𝑜subscript𝑤𝑗1Low_{i}=\max_{j\in parent_{i}}Low_{j}+1italic_L italic_o italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_j ∈ italic_p italic_a italic_r italic_e italic_n italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L italic_o italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + 1 and Highi=minjchildiHighj1𝐻𝑖𝑔subscript𝑖subscript𝑗𝑐𝑖𝑙subscript𝑑𝑖𝐻𝑖𝑔subscript𝑗1High_{i}=\min_{j\in child_{i}}High_{j}-1italic_H italic_i italic_g italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_j ∈ italic_c italic_h italic_i italic_l italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_H italic_i italic_g italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - 1. Then, we calculate the difference between the gates’ high and low values and choose the gate with the smallest difference. For this gate, we schedule it to the layer with the fewest gates to execute in all the possible layers. After that, we update the low value of its child nodes and the high value of its parent node. We repeat this process until all the gates are scheduled. The maximum number of gates per layer is 𝕄~~𝕄\widetilde{\mathbb{PM}}over~ start_ARG blackboard_P blackboard_M end_ARG of this circuit.

IV-A2 Quantum Chip Analyzing

We define the Chip Communication Capacity to characterize the number of parallel CNOT gates supported by a chip, denoted as \mathbb{C}blackboard_C. According to [17], any three CNOT gates can be executed simultaneously. As we refine the chip model, we generalize the previous theorem to the case that any b12+3𝑏123\lfloor\frac{b-1}{2}\rfloor+3⌊ divide start_ARG italic_b - 1 end_ARG start_ARG 2 end_ARG ⌋ + 3 braiding operations can be executed simultaneously where b𝑏bitalic_b is the chip’s bandwidth.

Definition 2

Chip Communication Capacity: Given a quantum chip, \mathbb{C}blackboard_C is the max number u𝑢uitalic_u that for any u𝑢uitalic_u independent CNOT gates with an arbitrary placement of tiles, there exists a simultaneous path schedule for all CNOT gates.

Theorem 2

For a chip with bandwidth b𝑏bitalic_b, given an arbitrary placement of the operand qubits, there exists a simultaneous paths schedule for b12+3𝑏123\lfloor\frac{b-1}{2}\rfloor+3⌊ divide start_ARG italic_b - 1 end_ARG start_ARG 2 end_ARG ⌋ + 3 gates.

Proof: According to Autobraid[17], any three CNOT operations must be able to execute simultaneously on a chip with bandwidth 1. The path of the additional CNOT gate has to intersect with others if and only if one involved tile is inside a ring and the other is outside. A ring is composed of paths and fully occupied channels. Increasing the bandwidth of each channel on the chip by two would break this ring and enable a path connecting two arbitrary tiles on the chip. Therefore, when the chip’s bandwidth is b𝑏bitalic_b, paths exist for b12+3𝑏123\lfloor\frac{b-1}{2}\rfloor+3⌊ divide start_ARG italic_b - 1 end_ARG start_ARG 2 end_ARG ⌋ + 3 CNOT operations to be executed in parallel.

IV-B Transforming

IV-B1 Initial Map**

To generate a preferred tile location map**, we employ the following three steps (Line1-1 in Algorithm1):

Shape Determining. First, we determine the shape of the logical tile array, i.e., whether to initialize it as a 3×3333\times 33 × 3 array or a 2×4242\times 42 × 4 array for a circuit with eight logical qubits when both schemes are available on the chip. Then, We select an array shape with the minimum perimeter. As shown in Fig. 10, we choose the 3×3333\times 33 × 3 tile array.

Map** Establishing. Secondly, We map each logical qubit to its corresponding logical tile according to communication cost, as shown in Fig. 10. The communication cost is calculated by cost function f=i,jγi,j×li,j𝑓subscript𝑖𝑗subscript𝛾𝑖𝑗subscript𝑙𝑖𝑗f=\sum_{i,j}\gamma_{i,j}\times l_{i,j}italic_f = ∑ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT × italic_l start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT, where li,jsubscript𝑙𝑖𝑗l_{i,j}italic_l start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT represents the Manhattan distance between the two tiles Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Tjsubscript𝑇𝑗T_{j}italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and γi,jsubscript𝛾𝑖𝑗\gamma_{i,j}italic_γ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT is the number of CNOTi,j𝐶𝑁𝑂subscript𝑇𝑖𝑗CNOT_{i,j}italic_C italic_N italic_O italic_T start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT in the circuit. Map** qubits that frequently communicate together can effectively reduce the communication cost. Here, we employ the Metis[21] method, an iterative graph partitioner, to generate map**s based on the qubit communication graph GCsubscript𝐺𝐶G_{C}italic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT and tile array. Due to the stochastic steps in the map** generation, we generate multiple map**s and select the one with minimal communication cost as our final result.

Bandwidth Adjusting. Finally, we assign the rest of the qubit resources to each channel based on their occupancy status, as illustrated in Fig. 10. We pre-execute each gate in the circuit to record its shortest path without considering the non-intersecting restrictions. Then, we increase the bandwidth for channels that perform the most paths. In most cases, this process effectively reduces the wait caused by channel resource occupation.

Refer to caption
\thesubsubfigure
Refer to caption
\thesubsubfigure
Refer to caption
\thesubsubfigure
Figure 10: Tile location map** process: (a) Shape determining, (b) map** establishing, (c) bandwidth adjusting.
Input: Quantum Circuit P𝑃Pitalic_P and Chip Lm1×m2subscript𝐿subscript𝑚1subscript𝑚2L_{m_{1}\times m_{2}}italic_L start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT
Output: Encoded Circuit PSsuperscript𝑃𝑆P^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT
1 tile_array = Tile_sha**(Lm1×m2subscript𝐿subscript𝑚1subscript𝑚2L_{m_{1}\times m_{2}}italic_L start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT);
2 Map**s = Metis(GCsubscript𝐺𝐶G_{C}italic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT);
3 M_location = Select(Map**s,cost function);
4 if double defect model then
5       M_cut = bipartite(GPsubscript𝐺𝑃G_{P}italic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT);
6 end if
7while GPsubscript𝐺𝑃G_{P}italic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT not empty do
8       gates=GPsubscript𝐺𝑃G_{P}italic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT.front_gate
9       gates_pri = priority(gates);
10       for gi(qa,qb)gates_pri.begin()formulae-sequencenormal-←subscript𝑔𝑖subscript𝑞𝑎subscript𝑞𝑏𝑔𝑎𝑡𝑒𝑠normal-_𝑝𝑟𝑖𝑏𝑒𝑔𝑖𝑛g_{i}(q_{a},q_{b})\leftarrow gates\_pri.begin()italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) ← italic_g italic_a italic_t italic_e italic_s _ italic_p italic_r italic_i . italic_b italic_e italic_g italic_i italic_n ( ) to gates_pri.end()formulae-sequence𝑔𝑎𝑡𝑒𝑠normal-_𝑝𝑟𝑖𝑒𝑛𝑑gates\_pri.end()italic_g italic_a italic_t italic_e italic_s _ italic_p italic_r italic_i . italic_e italic_n italic_d ( ) do
11             if CutaCutb𝐶𝑢subscript𝑡𝑎𝐶𝑢subscript𝑡𝑏Cut_{a}\neq Cut_{b}italic_C italic_u italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ≠ italic_C italic_u italic_t start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT or model = lattice surgery then
12                   PSsuperscript𝑃𝑆P^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT.add(path(gate, chip_now));
13            else
14                   Ma=Mta+θMsasubscript𝑀𝑎subscriptsubscript𝑀𝑡𝑎𝜃subscriptsubscript𝑀𝑠𝑎M_{a}={M_{t}}_{a}+\theta{M_{s}}_{a}italic_M start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT + italic_θ italic_M start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT;
15                   Mb=Mtb+θMsbsubscript𝑀𝑏subscriptsubscript𝑀𝑡𝑏𝜃subscriptsubscript𝑀𝑠𝑏M_{b}={M_{t}}_{b}+\theta{M_{s}}_{b}italic_M start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT + italic_θ italic_M start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT;
16                   min_value,min_index=min(Ma,Mb)𝑚𝑖𝑛_𝑣𝑎𝑙𝑢𝑒𝑚𝑖𝑛_𝑖𝑛𝑑𝑒𝑥subscript𝑀𝑎subscript𝑀𝑏min\_value,min\_index=\min(M_{a},M_{b})italic_m italic_i italic_n _ italic_v italic_a italic_l italic_u italic_e , italic_m italic_i italic_n _ italic_i italic_n italic_d italic_e italic_x = roman_min ( italic_M start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT );
17                   if min_value<0𝑚𝑖𝑛normal-_𝑣𝑎𝑙𝑢𝑒0min\_value<0italic_m italic_i italic_n _ italic_v italic_a italic_l italic_u italic_e < 0 then
18                        PSsuperscript𝑃𝑆P^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT.add(Cutmin_index𝐶𝑢subscript𝑡𝑚𝑖𝑛_𝑖𝑛𝑑𝑒𝑥Cut_{min\_index}italic_C italic_u italic_t start_POSTSUBSCRIPT italic_m italic_i italic_n _ italic_i italic_n italic_d italic_e italic_x end_POSTSUBSCRIPT modification);
19                  else
20                        PSsuperscript𝑃𝑆P^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT.add(path(gate, chip_now));
21                   end if
22                  
23             end if
24            
25       end for
26      
27 end while
return PSsuperscript𝑃𝑆P^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT, M_location
Algorithm 1 Scheduling for Limited Resources

IV-B2 Scheduling

Considering the qubit resources on the target chip may be limited or sufficient, we design two algorithms to maximize the utilization of resources.

Scheduling for limited Resources. When the resources of physical qubits are limited, i.e., when 𝕄~>b12+3~𝕄𝑏123\widetilde{\mathbb{PM}}>\lfloor\frac{b-1}{2}\rfloor+3over~ start_ARG blackboard_P blackboard_M end_ARG > ⌊ divide start_ARG italic_b - 1 end_ARG start_ARG 2 end_ARG ⌋ + 3, it may be difficult to find non-intersecting paths to execute all current executable gates. However, the number of children in gates of currently executable gates varies. We assign priorities to these nodes, effectively reducing latency at the bottleneck (Line 1 - 1 in Algorithm 1). The priority of a gate is determined by the remaining gates number (how many gates depend on it) and criticality (the length of the critical path of the remaining gates). Gates with higher criticality are prioritized. When two CNOT gates have the same criticality, we select the gate with more remaining gates to allow more gates to execute earlier to utilize non-congested cycles better.

The time complexity of this algorithm is O(g*m1*m2)𝑂𝑔subscript𝑚1subscript𝑚2O(g*m_{1}*m_{2})italic_O ( italic_g * italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT * italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), where g𝑔gitalic_g is the number of CNOT gates in the quantum circuit. The algorithm searches for paths for at most O(g)𝑂𝑔O(g)italic_O ( italic_g ) gates, and the maximum time required to find a path for each CNOT gate is O(m1*m2)𝑂subscript𝑚1subscript𝑚2O(m_{1}*m_{2})italic_O ( italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT * italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). Here, O(m1*m2)𝑂subscript𝑚1subscript𝑚2O(m_{1}*m_{2})italic_O ( italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT * italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) is the number of nodes available for path selection on the chip, which is m1*m2/d*dsubscript𝑚1subscript𝑚2𝑑𝑑m_{1}*m_{2}/d*ditalic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT * italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_d * italic_d, where d𝑑ditalic_d is the code distance and the side length of each tile.

Input: Execution Scheme E𝐸Eitalic_E
Output: Encoded Circuit PSsuperscript𝑃𝑆P^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT, initialization
1 now_step = 0;
2 while i<E.length()formulae-sequence𝑖𝐸𝑙𝑒𝑛𝑔𝑡i<E.length()italic_i < italic_E . italic_l italic_e italic_n italic_g italic_t italic_h ( ) do
3       while G𝐺Gitalic_G is bipartite graph do
4             for gateE[i].begin()formulae-sequencenormal-←𝑔𝑎𝑡𝑒𝐸delimited-[]𝑖𝑏𝑒𝑔𝑖𝑛gate\leftarrow E[i].begin()italic_g italic_a italic_t italic_e ← italic_E [ italic_i ] . italic_b italic_e italic_g italic_i italic_n ( ) to E[i].end()formulae-sequence𝐸delimited-[]𝑖𝑒𝑛𝑑E[i].end()italic_E [ italic_i ] . italic_e italic_n italic_d ( ) do
5                   G.add_edge(gate);
6             end for
7            i++;
8             Mcsubscript𝑀𝑐M_{c}italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = bipartite(G𝐺Gitalic_G);
9       end while
10       if having map** then
11            PSsuperscript𝑃𝑆P^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT.add(change map** to Mcsubscript𝑀𝑐M_{c}italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT);
12      else
13            initialization = Mcsubscript𝑀𝑐M_{c}italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT
14       end if
15      for jnow_stepnormal-←𝑗𝑛𝑜𝑤normal-_𝑠𝑡𝑒𝑝j\leftarrow now\_stepitalic_j ← italic_n italic_o italic_w _ italic_s italic_t italic_e italic_p to i𝑖iitalic_i do
16             find braiding path(E[j]𝐸delimited-[]𝑗E[j]italic_E [ italic_j ]);
17             PSsuperscript𝑃𝑆P^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT.add(E[j]𝐸delimited-[]𝑗E[j]italic_E [ italic_j ]);
18       end for
19      now_step = i;
20 end while
return PSsuperscript𝑃𝑆P^{S}italic_P start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT, initialization
Algorithm 2 Scheduling for Sufficient Resources

Scheduling for Sufficient Resources. When b12+3𝕄~𝑏123~𝕄\lfloor\frac{b-1}{2}\rfloor+3\geq\widetilde{\mathbb{PM}}⌊ divide start_ARG italic_b - 1 end_ARG start_ARG 2 end_ARG ⌋ + 3 ≥ over~ start_ARG blackboard_P blackboard_M end_ARG, an execution scheme can be rapidly derived from Algorithm Para-Finding and Theorem 2. Algorithm Para-Finding provides 𝕄~~𝕄\widetilde{\mathbb{PM}}over~ start_ARG blackboard_P blackboard_M end_ARG for this quantum circuit and a CNOT gate order scheme that achieves 𝕄~~𝕄\widetilde{\mathbb{PM}}over~ start_ARG blackboard_P blackboard_M end_ARG. This scheme indicates which gates are executed in each time cycle. Since the number of gates executed in each cycle is smaller than b12+3𝑏123\lfloor\frac{b-1}{2}\rfloor+3⌊ divide start_ARG italic_b - 1 end_ARG start_ARG 2 end_ARG ⌋ + 3, employing the methods outlined in Proof 2 to determine the corresponding paths for these gates becomes feasible.

IV-C Optimizations for Double Defect Model

Previous works Braidflash [19] and Autobraid [17] do not consider the cut type by assuming all tiles have the same cut type. However, cut type is critical in transforming for double defect model, providing a significant opportunity to reduce the time on the table. For a CNOT gate, it takes three cycles to be executed if two involved tiles are of the same cut type, but only one cycle when cut types are different.

IV-C1 Cut Type Initialization


The goal of the cut type initialization is to enable the execution of as many CNOT gates as possible within a single cycle. If the qubit communication graph is bipartite, we assign the same cut type to the logical qubits in the same set. This is the optimal cut type initialization, with which all CNOT gates can be implemented in one cycle.

However, for circuits whose qubit communication graph is not bipartite, find the optimal cut type initialization is NP-hard, according to Theorem 1. We propose a greedy algorithm that satisfies the requirement of cut type for gate executed earlier. To end this, firstly, we construct a sub-graph of the qubit communication graph where each vertex corresponds to a logical qubit. Then, we add the gates with no precursor in the current dag into the sub-graph. Next, we remove these gates in the DAG. Repeat these two steps until the newly added edges make the sub-graph no longer bipartite. The logical qubits belonging to the same set in this bipartite sub-graph are initialized to the same cut type.

IV-C2 Scheduling

When involving two tiles of a CNOT gate are of the same cut type, we estimate the impact of modifying cut type by calculating the M-value of each tile, specifically M-value=Mt+θ×MsM-valuesubscript𝑀𝑡𝜃subscript𝑀𝑠\text{M-value}=M_{t}+\theta\times M_{s}M-value = italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_θ × italic_M start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the impact on time. It takes three cycles to execute the operation directly and four cycles with modification. If this tile is idle previously, the modification operation can be performed earlier to reduce the time cost. Mssubscript𝑀𝑠M_{s}italic_M start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is the impact of the occupation of the channel. CNOT gate needs two braiding operations between the tiles without changing the cut type but only needs one after modification. We adopt the look-forward strategy considering the impact of this modification on the children gates of this gate. The parameter θ𝜃\thetaitalic_θ is used to determine the weights of the two factors, Mtsubscript𝑀𝑡M_{t}italic_M start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and Mssubscript𝑀𝑠M_{s}italic_M start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, in the current situation, where θ=(|ready gate|×2)/bandwidth×n𝜃ready gate2bandwidth𝑛\theta=(|\text{ready gate}|\times 2)/\text{bandwidth}\times nitalic_θ = ( | ready gate | × 2 ) / bandwidth × italic_n. We choose to modify the type of the tile when the M-value is greater than 0 (Line 1 - 1 in Algorithm 1).

IV-C3 Sufficient Scheduling

When physical qubit resources are sufficient, we adopt the methods in Section IV-B2 to determine the tile location map** and gate schedule scheme. The key idea for cut type initialization and scheduling is to make all CNOT gates execute in one cycle by remap** the cut type.

We propose the cut type scheduling algorithm Algorithm 2, whose execution flow is as follows. Firstly, we construct the qubit communication graph by sequentially adding edges from the execution scheme until it is no longer bipartite. Then, we use this bi-partition graph to initialize the cut type for executing this sub-circuit. When the operand tiles of CNOT gate are of the same cut type, our methods spend three cycles to modify the cut type to the new map** found in the same way above. These two steps are iterated over until all the gates have been scheduled. We provide the cut type scheduling algorithm with 5252\frac{5}{2}divide start_ARG 5 end_ARG start_ARG 2 end_ARG-approximation guarantee (as shown in Theorem 3).

Lemma 1

The qubit communication graph generated by any two layers of gates is bipartite.

Proof: Since logical qubits can participate in at most one CNOT gate in each layer, the qubit communication sub-graph generated by the 2-layer circuit has a maximum degree of two, and the two edges connected by a vertex must belong to two different layers. A graph with a maximum degree of two can only consist of lines or rings. A ring must be an even ring since two edges must be connected by a vertex in the odd ring that belongs to the same layer. As a result, this graph is bipartite since the qubit communication sub-graph can only consist of lines and even rings.

Theorem 3

Algorithm 2 is 5252\frac{5}{2}divide start_ARG 5 end_ARG start_ARG 2 end_ARG-approximation.

Proof: For every two cycles of gates in the execution scheme given by Algorithm Para-Finding, the optimal cases must take two braiding cycles to execute since the gates with gate dependencies cannot be executed simultaneously. According to Lemma 1, our method requires at most five braiding cycles to execute these two layers of gates, three cycles for modifying to optimal cut type map**, and two cycles for performing braiding operations. Thus, our algorithm is 5252\frac{5}{2}divide start_ARG 5 end_ARG start_ARG 2 end_ARG-approximation.

V Performance Evaluation

TABLE I: Overview of Experiment Results
Circuit n𝑛nitalic_n α𝛼\alphaitalic_α g𝑔gitalic_g11footnotemark: 1 Autobraid Ecmas-dd EDPCI Ecmas-ls
Min Min ReSu Min 4X Min 4X
dnn_n8 8 48 192 147 48 48 48 53 48 48
grover 9 110 132 330 166 140 110 110 110 110
qpe_n9 9 42 43 126 70 54 42 42 42 42
BV_10 10 5 5 15 5 5 5 5 5 5
QFT_10 10 93 105 279 165 96 93 93 93 93
adder_n10 10 55 65 165 78 82 55 56 55 55
ising_n10 10 20 90 60 20 20 20 20 24 20
sat_n11 11 204 252 612 336 339 204 204 204 204
square_root_n4 11 221 294 663 379 389 221 225 221 221
multiplier_n15 15 133 222 399 232 244 133 134 133 133
qf21_n15 15 112 115 336 197 130 112 112 112 112
dnn_n16 16 48 384 198 71 48 79 53 68 52
square_root_n18 18 644 898 1932 1047 1133 644 645 644 644
ghz_state_n23 23 22 22 66 22 22 22 22 22 22
multiplier_n25 25 381 670 1143 659 717 383 385 381 381
swap_test_n25 25 63 96 201 89 99 67 65 63 63
wstate_n27 27 28 52 84 28 28 28 28 28 28
BV_50 50 27 27 81 27 27 27 27 27 27
QFT_50 50 2363 2435 7089 4633 2366 2363 2363 2363 2363
ising_n50 50 4 98 15 10 4 6 6 9 7
quantum_walk 11 14104 14372 42312 20188 19669 14104 14104 14104 14104
shor 12 13412 13838 40248 22978 20315 13412 13414 13414 13412

11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT n𝑛nitalic_n is the number of the qubits, α𝛼\alphaitalic_α is the depth of the circuit, g𝑔gitalic_g is the CNOT gate of the circuit.

In this section, we first compare the performance of our methods to several state-of-the-art methods, AutoBraid [17] for double defect model and EDPCI [3] for lattice surgery model. Then we evaluate the performance of Ecmas as the communication resources increased. The details of our evaluation results are shown in Section V-B and we highlight our key findings as follows:

  • For double defect model, Ecmas outperforms Autobraid[17], reducing the cycle of the transformed circuit by 67.3% at most, on average 51.5%.

  • For lattice surgery model, Ecmas reaches the optimal solution in most of the test benchmarks, reducing the cycle of the transformed circuit by 13.9% at most compared with EDPCI [3].

  • Compared with the result in the minimum viable chip, when the chip size increases 4x, Ecmas reduces the execution time by 10.8% and 30.9% in the double defect model and lattice surgery model respectively.

  • Ecmas exhibits excellent scalability, effectively reducing the execution time of circuits as the chip size increases, while maintaining linear growth in compilation time.

V-A Evaluation Setting

Metrics. We use the number of cycles to represent the communication time, which is used to measure the effectiveness of the compilation results.

Baselines. For double defect and lattice surgery models, we select the state-of-the-art algorithms AutoBraid[17] and EDPCI [3] as our baselines.

Chip Configuration. We evaluate our map** and scheduling algorithm on three resource configurations Ll×lsubscript𝐿𝑙𝑙L_{l\times l}italic_L start_POSTSUBSCRIPT italic_l × italic_l end_POSTSUBSCRIPT: minimum viable, 4x, and sufficient qubits. For minimum viable qubits, l=n×5d𝑙𝑛5𝑑l=\lceil\sqrt{n}\rceil\times 5ditalic_l = ⌈ square-root start_ARG italic_n end_ARG ⌉ × 5 italic_d for double defect model and l=n×2d𝑙𝑛2𝑑l=\lceil\sqrt{n}\rceil\times\lceil\sqrt{2}d\rceilitalic_l = ⌈ square-root start_ARG italic_n end_ARG ⌉ × ⌈ square-root start_ARG 2 end_ARG italic_d ⌉ for lattice surgery model, which is the smallest square grid chip that provides enough qubits. The 4x resource for lattice surgery is a chip with l=n×5d𝑙𝑛5𝑑l=\lceil\sqrt{n}\rceil\times 5ditalic_l = ⌈ square-root start_ARG italic_n end_ARG ⌉ × 5 italic_d. And the l𝑙litalic_l for Ecmas-ReSu on sufficient resources depends on 𝕄𝕄\mathbb{PM}blackboard_P blackboard_M of the circuit.

Benchmarks. We use the quantum circuit from the previous works, including IBM Qiskit [29], ScaffCC [20], QUEKO [35], QASMbench [24] and random circuits with certain parallelism degree.

Evaluation Platform. Our experiments are performed on Intel(R) Xeon(R) CPU 6248R 96vCores 3.00GHz, with 256GB DDR4 memory. The operating system is Ubuntu 20.04.

V-B Experiment Results

V-B1 Double Defect Model

We evaluate the performance of Ecmas with two chip configurations: 1) minimum viable chip, and 2) sufficient resources. As shown in Table I, Ecmas outperforms AutoBraid methods with a reduction by 67.3% at most in the number of cycles, on average 51.5%. In Ecmas-ReSu, 40.9% of the circuits are further reduced in the number of cycles. Compared with AutoBraid, the average cycles are reduced by 57.1%. Increasing communication resources addresses the latency caused by braiding congestion. Thus, only the circuits suffering braiding congestion can benefit from the increase in bandwidth. The greater the parallelism, the more the time decreases as the bandwidth increases. Ecmas-ReSu is not always the best among these results. The reason is that the Ecmas-ReSu schedules gates have more strict limits for performance guarantee.

V-B2 Lattice Surgery

For most circuits, our approach obtains the optimal solution as well as EDPCI in Table I. For circuits with higher 𝕄𝕄\mathbb{PM}blackboard_P blackboard_M, Ecmas achieving better results reduction cycle of up to 13.9% compared with EDPCI. Since the Ecmas-ReSu for the lattice surgery model is guaranteed to yield the optimal solution (as described in Section IV-B2 ), we did not evaluate its performance here. Due to the absence of specialized optimizations in our approach for circuits with specific patterns, it falls EDPCI on the circuit ising_n10 on the minimum viable chip, whose CNOT gates are adjacent in the snake map**. However, the absence of an effective initial map** hampers EDPCI capacity to capitalize on the increased physical qubit resources. Our approach can leverage additional chip resources to reduce circuit depth. All results on the 4x resources are superior to or equal to the minimal viable chip.

V-C Sensitivity Study

In this section, we conduct sensitivity studies to investigate the impact of our strategies, i.e., location and cut type initialization, gate prioritizing, and cut type scheduling. We also analyze the scalability of circuit parallelism and chip size. These results demonstrate that our method outperforms baselines on most quantum application circuits, especially those with medium to high parallelism. Moreover, our algorithm effectively utilizes the redundant physical qubit resources on the chip to reduce the circuit cycle.

V-C1 Initialization Method

We examine the impact of initialization on the execution time in location and cut type. These evaluations and the subsequent scheduling experiments are conducted on the minimum viable chip.

Location: As illustrated in Table II, our method consistently shows superior performance in most circuits. We compared the influence of tile location selection on circuit depth with the trivial map** in EDPCI [3] and the Metis map**[21]. The trivial method refers to a twisting layout of logical qubits, where the qubits in the first row are placed from left to right, followed by the qubits in the second row placed from right to left, and repeated until all logical qubits are fully mapped. The shortcomings on the Ising circuit primarily result from the absence of specialized optimization targeting specific patterns. Nevertheless, the overall performance trend underscores our approach’s robustness across diverse scenarios.

TABLE II: Comparision of location initialization methods
Circuit name n𝑛nitalic_n α𝛼\alphaitalic_α g𝑔gitalic_g Trivial Metis Ours
dnn_n8 8 192 48 48 72 48
grover 9 132 110 112 110 110
qpe_n9 9 42 43 42 42 42
ising_n10 10 90 20 20 36 20
adder_n10 10 65 55 55 55 55
QFT_10 10 105 93 95 93 93
multiply_n13 13 40 23 25 24 23
square_root_n18 18 898 644 644 644 644
ghz_state_n23 23 22 22 22 22 22
swap_test_n25 25 63 96 63 63 63
ising_n50 50 98 4 4 11 9

Cut type: In most cases, our method outperforms the baseline methods, as shown in Table III. We compare our cut type initialization algorithm with the random and max-cut algorithms. The random method assigns tiles with a random cut type. The max-cut method maximizes the number of CNOT gates with different cut types. We use the one_exchange method in NetworkX to implement the max-cut partition. For specific circuits such as ghz_state_n23, our initialization algorithm can significantly reduce the number of cycles. This is because the max-cut method aims to reduce the overall number of CNOT gates with different cut types. However, the cut type of tiles is dynamic since it can be modified during the execution. The initialization method should emphasize the front part of the quantum circuit.

TABLE III: Comparision of cut type initialization methods
Circuit name n𝑛nitalic_n α𝛼\alphaitalic_α g𝑔gitalic_g Random Max-cut Ours
dnn_n8 8 48 192 64 48 48
grover 9 110 132 173 172 166
qpe_n9 9 42 43 73 76 70
ising_n10 10 20 90 37 29 20
adder_n10 10 55 65 85 82 78
QFT_10 10 93 105 171 173 165
multiply_n13 13 23 40 39 37 35
square_root_n18 18 644 898 1052 1053 1047
ghz_state_n23 23 22 22 48 40 22
swap_test_n25 25 63 96 120 94 89
ising_n50 50 4 98 11 10 10

V-C2 Scheduling Strategy

We investigate our methods from two perspectives: gate scheduling and cut-type scheduling.

Gate scheduling: According to the results in Table IV, our method achieves optimal solutions in most benchmarks. We compare our gate scheduling method with the circuit-order approach in lattice surgery model, where circuit-order denotes scheduling gates based on their appearance in the circuit. Compared with circuit-order, our method optimizes up to 23% of the execution time.

TABLE IV: Comparison of different gate scheduling algorithms
Circuit name n𝑛nitalic_n α𝛼\alphaitalic_α g𝑔gitalic_g Circuit-order Ours
dnn_n8 8 48 192 66 54
grover 9 110 132 112 114
qpe_n9 9 42 43 42 42
ising_n10 10 20 90 26 20
adder_n10 10 55 65 55 55
QFT_10 10 93 105 93 93
multiply_n13 13 23 40 24 23
square_root_n18 18 644 898 644 644
ghz_state_n23 23 22 22 22 22
swap_test_n25 25 63 96 63 63
ising_n50 50 4 98 9 9

Cut type scheduling: As shown in Table V, our algorithm outperforms the best baseline strategies on these benchmarks, achieving an average reduction of 25% and up to a maximum of 50%. We compared the cycle number of our methods with the other two strategies: Time-first and Channel-first. These two strategies determine whether to modify the cut type when dealing with a CNOT gate with different cut types. The former chooses the operations that make the CNOT gate complete as soon as possible, while the latter minimizes the channel occupation of this CNOT gate. Our optimization is caused by our strategy of adaptively adjusting the weights of time and channel based on resource conditions, making our strategy perform well in most scenarios.

TABLE V: Comparison of different cut type scheduling
Circuit name n𝑛nitalic_n α𝛼\alphaitalic_α g𝑔gitalic_g Channel-first Time-first Ours
dnn_n8 8 48 192 48 48 48
grover 9 110 132 166 174 110
qpe_n9 9 42 43 70 96 42
ising_n10 10 20 90 20 20 20
adder_n10 10 55 65 78 88 55
QFT_10 10 93 105 165 120 93
multiply_n13 13 23 40 35 41 23
square_root_n18 18 644 898 1047 1117 644
ghz_state_n23 23 22 22 22 22 22
swap_test_n25 25 63 96 89 102 63
ising_n50 50 4 98 8 8 4

V-C3 Scalability

We explore the effectiveness of Ecmas on various input quantum circuits and chip sizes. Determining the parallelism of a given quantum circuit is challenging, but generating quantum circuits with specified parallelism is feasible. Inspired by QUEKO [35], we generate 50 random quantum circuits (as a test group) with 49 qubits, 50 depth, and parallelism ranging from 1 to 21. We use the average number of cycles in each group as the result.

Refer to caption
(a) Lattice surgery model
Refer to caption
(b) Double defect model
Figure 11: Effect of circuit parallelism

Scalability of Circuit Parallelism Degree: In lattice surgery model, our approach generally outperforms the performance of EDPCI for most circuits, particularly in circuits with parallelism 3 to 13. Our method’s performance is slightly less effective for circuits with high 𝕄𝕄\mathbb{PM}blackboard_P blackboard_M than that of EDPCI. This is due to our algorithm more likely to get trapped in local optima in these cases. In double defect model, the optimization ratio increased from 43% to 62.9% when Circuit Parallelism Degree increasing from 1 to 21, as shown in Fig. 10(b). This is mainly attributed to our scheduling strategy for the cut type, which effectively leverages the waiting time due to path conflicts. We save significant channel resources by adjusting the cut type when the tile cut types are the same.

Refer to caption
Figure 12: Effect of chip size

Scalability of Chip Size: Fig. 12 illustrates the trends of Ecmas’s performance (cycles) and efficiency (compiling time ratio) as the chip size increases. The compiling time ratio is τ(i,j)/τ(i,min)subscript𝜏𝑖𝑗subscript𝜏𝑖\tau_{(i,j)}/\tau_{(i,\min)}italic_τ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT / italic_τ start_POSTSUBSCRIPT ( italic_i , roman_min ) end_POSTSUBSCRIPT where τ(i,min)subscript𝜏𝑖\tau_{(i,\min)}italic_τ start_POSTSUBSCRIPT ( italic_i , roman_min ) end_POSTSUBSCRIPT is compiling time of circuits with parallelism i𝑖iitalic_i at minimum viable chip and τ(i,j)subscript𝜏𝑖𝑗\tau_{(i,j)}italic_τ start_POSTSUBSCRIPT ( italic_i , italic_j ) end_POSTSUBSCRIPT is compiling time of circuits with parallelism i𝑖iitalic_i at chip size j𝑗jitalic_j. We use this metric to fairly compare the scalability among the three algorithms Ecmas, Autobraid, and EDPCI were programmed in Python, C++, and Julia. The chip is a square with the average bandwidth per channel from 1 to 5. The result demonstrates that our methods’ circuit cycles decrease as the chip size increases. The execution time of the circuits with 𝕄=21𝕄21\mathbb{PM}=21blackboard_P blackboard_M = 21 can be decreased by 10.8% for double defect model and decreased by 30.9% in lattice surgery model when the average bandwidth of the chip rises from 1 to 2.

VI Related Work

Most existing quantum compilers [40, 33, 25, 38] focus on the physical qubit level compilation designed for NISQ circuits with 50 to 200 qubits, which is not fault-tolerant. These works focus on converting a logical circuit into a hardware-dependent physical circuit with respect to CNOT gates applied to physical qubits connected in the hardware.

Fault-tolerant compilation primarily focuses on architectures based on surface code, as it is the most promising error-correcting code in superconducting quantum computers. The fundamental difference between compiling a surface code circuit and a NISQ circuit is separating communication resources (channels) and computational resources (tiles). The resources are software-defined and can be specialized for specific circuits. The execution of CNOT gates is no longer achieved by moving the data to the two physically adjacent physical qubits. Instead, it takes place within the channel, using exclusive access to communication resources. Depending on the method of constructing logical qubits, surface code can be divided into double defect [10] and lattice surgery [16] with different logical operation implementation strategies. Double defect employs the braiding technique to perform CNOTgates. Braidflash [19] abstracts the constraints of CNOT gates implementation into braiding path disjoint. Autobraid [17] further discovers the local parallelism pattern and designs a stack-based search algorithm that enables efficient search for as many parallel CNOT gates as possible. Lattice surgery is a novel entrant in surface code approaches, employing a reduced number of physical qubits for encoding a logical qubit. It utilizes ZZ measurements for CNOT gate [26]. EDPCI [3] achieves long-range CNOT gates by utilizing ancilla tiles to construct Bell states. This approach requires a fourfold increase in physical qubits but enables the completion of CNOT gates at arbitrary distances within 2d2𝑑2d2 italic_d surface code cycles. However, this approach does not account for the impact of initial map**. Disregarding the circuit communication requirements with a trivial initial map** results in a paradoxical situation where the circuit’s performance worsens as chip resources increase.

Other works on fault-tolerant quantum compilation have focused on synergy with chip characteristics. Wu et al.[36] proposes a lattice encoding method for superconducting chips, which adapts the various chip structures to the surface code’s 2D lattice. Previous works [27, 6] involve adapting the surface code to hexagonal chips, reducing chip connectivity to improve the accuracy of physical qubits. Some efforts [27] focus on utilizing operations with lower error rates during the compilation process to enhance circuit accuracy. Preskill et al.[28], on the other hand, centers on concatenating surface code and high-rate code, like quantum LDPC encoding, to address the challenges of low code rate and limited scalability in surface code implementations.

VII Conclusion

In this paper, we study the surface code map** and scheduling problem for the lattice surgery and double defect models. We formalize the problems in both models and establish the problem’s complexity, particularly highlighting challenges in the double defect model. We introduce Circuit Parallelism Degree and Chip Communication Capacity to quantitatively analyze quantum circuits and quantum chips. Our map** and scheduling methods, named Ecmas, feature algorithms for scenarios with sufficient and limited qubit resources. Extensive evaluations show that Ecmas provides significant reduction over the state-of-the-art approaches by reducing the execution time by 33.3% to 67.3% for double defect model and reducing by up to 13.9% for lattice surgery model.

Limitation and future work: As Circuit Parallelism Degree and Chip Communication Capacity are critical parameters for our map** and scheduling methods, we still lack effective algorithms to obtain accurate results. In addition, our cost function to determine the importance of the current gate shows less effectiveness for high-parallel circuits compared to circuits with lower parallelism. Our investigation anticipates dynamic transforming strategies modifying map**s during the transforming process. Moreover, the complexity of scheduling problems and the bounds of chip communication capacity are still open problems.

Acknowledgements

The research is partially supported by National Key R&D Program of China under Grant No.2021ZD0110400, Innovation Program for Quantum Science and Technology 2021ZD0302901, Anhui Initiative in Quantum Information Technologies under grant No. AHY150300 and China National Natural Science Foundation with No. 62132018, ”Pioneer” and ”Leading Goose” R&D Program of Zhejiang”, 2023C01029, and 2023C01143. Xiang-Yang Li is the corresponding author (Email: [email protected]).

Appendix A Proof of Theorem 1

We prove that any instance of a 3-SAT problem can be reduced to an instance of a surface code initialization problem in polynomial time. We can construct the corresponding quantum circuit for any n𝑛nitalic_n-clause 3-SAT problem that the 3-SAT problem can be satisfied if and only if an initialization exists to execute this circuit no more than 10+3n103𝑛10+3n10 + 3 italic_n cycles. Here, we assume that the bandwidth of the channel is sufficient. The quantum circuit, as shown in Fig.13e, is constructed in the following way:

For each three-literal clauses Cisubscript𝐶𝑖C_{i}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT construct a sub-circuit with 8888 qubits, qia,qib,qicsubscript𝑞𝑖𝑎subscript𝑞𝑖𝑏subscript𝑞𝑖𝑐q_{ia},q_{ib},q_{ic}italic_q start_POSTSUBSCRIPT italic_i italic_a end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_i italic_b end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_i italic_c end_POSTSUBSCRIPT represent three literals a,b,c𝑎𝑏𝑐a,b,citalic_a , italic_b , italic_c in the clause, qia,qib,qicsubscript𝑞𝑖superscript𝑎subscript𝑞𝑖superscript𝑏subscript𝑞𝑖superscript𝑐q_{ia^{\prime}},q_{ib^{\prime}},q_{ic^{\prime}}italic_q start_POSTSUBSCRIPT italic_i italic_a start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_i italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_i italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT are the ancilla qubits respectively, and qiT,qiFsubscript𝑞𝑖𝑇subscript𝑞𝑖𝐹q_{iT},q_{iF}italic_q start_POSTSUBSCRIPT italic_i italic_T end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_i italic_F end_POSTSUBSCRIPT represent the logical qubits initialized into X-cut tile and Z-cut tile. If the clause’s first literal a𝑎aitalic_a is positive, we add a CNOT gate between q1asubscript𝑞1𝑎q_{1a}italic_q start_POSTSUBSCRIPT 1 italic_a end_POSTSUBSCRIPT and q1Tsubscript𝑞1𝑇q_{1T}italic_q start_POSTSUBSCRIPT 1 italic_T end_POSTSUBSCRIPT, otherwise between q1asubscript𝑞1𝑎q_{1a}italic_q start_POSTSUBSCRIPT 1 italic_a end_POSTSUBSCRIPT and q1Fsubscript𝑞1𝐹q_{1F}italic_q start_POSTSUBSCRIPT 1 italic_F end_POSTSUBSCRIPT. Then we add a CNOT gate between q1Tsubscript𝑞1𝑇q_{1T}italic_q start_POSTSUBSCRIPT 1 italic_T end_POSTSUBSCRIPT and q1Fsubscript𝑞1𝐹q_{1F}italic_q start_POSTSUBSCRIPT 1 italic_F end_POSTSUBSCRIPT. After that we add two ancilla CNOT gates between q1bsubscript𝑞1𝑏q_{1b}italic_q start_POSTSUBSCRIPT 1 italic_b end_POSTSUBSCRIPT and q1bsubscript𝑞1superscript𝑏q_{1b^{\prime}}italic_q start_POSTSUBSCRIPT 1 italic_b start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, as well as q1csubscript𝑞1𝑐q_{1c}italic_q start_POSTSUBSCRIPT 1 italic_c end_POSTSUBSCRIPT and q1csubscript𝑞1superscript𝑐q_{1c^{\prime}}italic_q start_POSTSUBSCRIPT 1 italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. For the second and third literals b𝑏bitalic_b and c𝑐citalic_c, the circuit is constructed in the same way as above. For example, the sub-circuits corresponding to clauses (a¬bc)𝑎𝑏𝑐(a\lor\lnot b\lor c)( italic_a ∨ ¬ italic_b ∨ italic_c ) are shown in Fig.13a. If the clause is true, a tail map** exists, allowing the depth of this sub-circuit to be no more than 10. The cut type of each tile is mapped one-to-one with the true and false values of this literal. Here, the black gates are used for placeholders so that the tiles do not have time to change their tile type within 10 cycles.

Each three-literal clause generates a corresponding sub-circuit, which we connect in parallel, and the depth of it is no more than 10101010 if and only if all the clauses are True. Next, we must ensure that the same literal in different clauses corresponds to the same cut type. This is achieved through sub-circuit in Fig.13b. We declare an ideal literal and let the literal in different clauses perform CNOT gates with it. The circuit can achieve its shortest depth only if they are of different types to the ideal literal. Here, the black gate is used for placeholder operations if a tile modifies its tile type and to supplement the circuit so that the shortest depths corresponding to different literals are n𝑛nitalic_n. For the ideal True and False, we require an additional n-depth sub-circuit in Fig.13c that makes their cut types different. Sub-circuit Fig.13d ensures that the ideal literal does not modify its cut type while waiting for the clauses sub-circuit to execute.

Refer to caption
Figure 13: Quantum circuit construction for an n𝑛nitalic_n-clause 3-SAT Problem

References

  • [1] “Suppressing quantum errors by scaling a surface code logical qubit,” Nature, vol. 614, no. 7949, pp. 676–681, 2023.
  • [2] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A. Buell et al., “Quantum supremacy using a programmable superconducting processor,” Nature, vol. 574, no. 7779, pp. 505–510, 2019.
  • [3] M. Beverland, V. Kliuchnikov, and E. Schoute, “Surface code compilation via edge-disjoint paths,” PRX Quantum, vol. 3, no. 2, p. 020342, 2022. [Online]. Available: https://github.com/eddieschoute/TeleportRouter.jl
  • [4] S. Bravyi and J. Haah, “Magic-state distillation with low overhead,” Physical Review A, vol. 86, no. 5, p. 052329, 2012.
  • [5] S. B. Bravyi and A. Y. Kitaev, “Quantum codes on a lattice with boundary,” arXiv preprint quant-ph/9811052, 1998.
  • [6] C. Chamberland, G. Zhu, T. J. Yoder, J. B. Hertzberg, and A. W. Cross, “Topological and subsystem codes on low-degree graphs with flag qubits,” Physical Review X, vol. 10, no. 1, p. 011022, 2020.
  • [7] E. Dennis, A. Kitaev, A. Landahl, and J. Preskill, “Topological quantum memory,” Journal of Mathematical Physics, vol. 43, no. 9, pp. 4452–4505, 2002.
  • [8] Y. Ding, A. Holmes, A. Javadi-Abhari, D. Franklin, M. Martonosi, and F. Chong, “Magic-state functional units: Map** and scheduling multi-level distillation circuits for fault-tolerant quantum architectures,” in 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).   IEEE, 2018, pp. 828–840.
  • [9] G. Finke, P. Lemaire, J.-M. Proth, and M. Queyranne, “Minimizing the number of machines for minimum length schedules,” European Journal of Operational Research, vol. 199, no. 3, pp. 702–705, 2009.
  • [10] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland, “Surface codes: Towards practical large-scale quantum computation,” Physical Review A, vol. 86, no. 3, p. 032324, 2012.
  • [11] I. M. Georgescu, S. Ashhab, and F. Nori, “Quantum simulation,” Reviews of Modern Physics, vol. 86, no. 1, p. 153, 2014.
  • [12] C. Gidney and M. Ekerå, “How to factor 2048 bit rsa integers in 8 hours using 20 million noisy qubits,” Quantum, vol. 5, p. 433, 2021.
  • [13] L. K. Grover, “A fast quantum mechanical algorithm for database search,” in Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, 1996, pp. 212–219.
  • [14] V. Havlíček, A. D. Córcoles, K. Temme, A. W. Harrow, A. Kandala, J. M. Chow, and J. M. Gambetta, “Supervised learning with quantum-enhanced feature spaces,” Nature, vol. 567, no. 7747, pp. 209–212, 2019.
  • [15] D. Herr, F. Nori, and S. J. Devitt, “Optimization of lattice surgery is np-hard,” Npj quantum information, vol. 3, no. 1, p. 35, 2017.
  • [16] C. Horsman, A. G. Fowler, S. Devitt, and R. Van Meter, “Surface code quantum computing by lattice surgery,” New Journal of Physics, vol. 14, no. 12, p. 123011, 2012.
  • [17] F. Hua, Y. Chen, Y. **, C. Zhang, A. Hayes, Y. Zhang, and E. Z. Zhang, “Autobraid: A framework for enabling efficient surface code communication in quantum computing,” in MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021, pp. 925–936. [Online]. Available: https://github.com/huafei1137/Autobraid
  • [18] H.-Y. Huang, M. Broughton, M. Mohseni, R. Babbush, S. Boixo, H. Neven, and J. R. McClean, “Power of data in quantum machine learning,” Nature communications, vol. 12, no. 1, pp. 1–9, 2021.
  • [19] A. Javadi-Abhari, P. Gokhale, A. Holmes, D. Franklin, K. R. Brown, M. Martonosi, and F. T. Chong, “Optimized surface code communication in superconducting quantum computers,” in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017, pp. 692–705.
  • [20] A. JavadiAbhari, S. Patil, D. Kudrow, J. Heckey, A. Lvov, F. T. Chong, and M. Martonosi, “Scaffcc: A framework for compilation and analysis of quantum computing programs,” in Proceedings of the 11th ACM Conference on Computing Frontiers, 2014, pp. 1–10.
  • [21] G. Karypis, “Metis: Unstructured graph partitioning and sparse matrix ordering system,” Technical report, 1997.
  • [22] A. Y. Kitaev, “Fault-tolerant quantum computation by anyons,” Annals of physics, vol. 303, no. 1, pp. 2–30, 2003.
  • [23] S. Krinner, N. Lacroix, A. Remm, A. Di Paolo, E. Genois, C. Leroux, C. Hellings, S. Lazar, F. Swiadek, J. Herrmann et al., “Realizing repeated quantum error correction in a distance-three surface code,” Nature, vol. 605, no. 7911, pp. 669–674, 2022.
  • [24] A. Li, S. Stein, S. Krishnamoorthy, and J. Ang, “Qasmbench: A low-level quantum benchmark suite for nisq evaluation and simulation,” ACM Transactions on Quantum Computing, 2022.
  • [25] G. Li, Y. Ding, and Y. ** problem for nisq-era quantum devices,” in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019, pp. 1001–1014.
  • [26] D. Litinski, “A game of surface codes: Large-scale quantum computing with lattice surgery,” Quantum, vol. 3, p. 128, 2019.
  • [27] M. McEwen, D. Bacon, and C. Gidney, “Relaxing hardware requirements for surface code circuits using time-dynamics,” arXiv preprint arXiv:2302.02192, 2023.
  • [28] C. A. Pattison, A. Krishna, and J. Preskill, “Hierarchical memories: Simulating quantum ldpc codes with local gates,” arXiv preprint arXiv:2303.04798, 2023.
  • [29] Qiskit contributors, “Qiskit: An open-source framework for quantum computing,” 2023.
  • [30] M. Schuld and N. Killoran, “Quantum machine learning in feature hilbert spaces,” Physical review letters, vol. 122, no. 4, p. 040504, 2019.
  • [31] P. W. Shor, “Algorithms for quantum computation: discrete logarithms and factoring,” in Proceedings 35th annual symposium on foundations of computer science.   Ieee, 1994, pp. 124–134.
  • [32] P. W. Shor, “Fault-tolerant quantum computation,” in Proceedings of 37th Conference on Foundations of Computer Science.   IEEE, 1996, pp. 56–65.
  • [33] M. Y. Siraichi, V. F. d. Santos, C. Collange, and F. M. Q. Pereira, “Qubit allocation,” in Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018, pp. 113–125.
  • [34] S. A. Stein, B. Baheri, D. Chen, Y. Mao, Q. Guan, A. Li, S. Xu, and C. Ding, “Quclassi: A hybrid deep neural network architecture based on quantum state fidelity,” Proceedings of Machine Learning and Systems, vol. 4, pp. 251–264, 2022.
  • [35] B. Tan and J. Cong, “Optimality study of existing quantum computing layout synthesis tools,” IEEE Transactions on Computers, vol. 70, no. 9, pp. 1363–1373, 2020.
  • [36] A. Wu, G. Li, H. Zhang, G. G. Guerreschi, Y. Ding, and Y. Xie, “A synthesis framework for stitching surface code with superconducting quantum devices,” in Proceedings of the 49th Annual International Symposium on Computer Architecture, 2022, pp. 337–350.
  • [37] Y. Wu, W.-S. Bao, S. Cao, F. Chen, M.-C. Chen, X. Chen, T.-H. Chung, H. Deng, Y. Du, D. Fan et al., “Strong quantum computational advantage using a superconducting quantum processor,” Physical review letters, vol. 127, no. 18, p. 180501, 2021.
  • [38] C. Zhang, A. B. Hayes, L. Qiu, Y. **, Y. Chen, and E. Z. Zhang, “Time-optimal qubit map**,” in Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021, pp. 360–374.
  • [39] Y. Zhao, Y. Ye, H.-L. Huang, Y. Zhang, D. Wu, H. Guan, Q. Zhu, Z. Wei, T. He, S. Cao et al., “Realization of an error-correcting surface code with superconducting qubits,” p. 030501, 2022.
  • [40] A. Zulehner, A. Paler, and R. Wille, “An efficient methodology for map** quantum circuits to the ibm qx architectures,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 7, pp. 1226–1236, 2018.