Search | arXiv e-print repository

Linear Depth QFT over IBM Heavy-hex Architecture

Authors: Xiangyu Gao, Yuwei **, Minghao Guo, Henry Chen, Eddy Z. Zhang

Abstract: Compiling a given quantum algorithm into a target hardware architecture is a challenging optimization problem. The compiler must take into consideration the coupling graph of physical qubits and the gate operation dependencies. The existing noise in hardware architectures requires the compilation to use as few running cycles as possible. Existing approaches include using SAT solver or heuristics t… ▽ More Compiling a given quantum algorithm into a target hardware architecture is a challenging optimization problem. The compiler must take into consideration the coupling graph of physical qubits and the gate operation dependencies. The existing noise in hardware architectures requires the compilation to use as few running cycles as possible. Existing approaches include using SAT solver or heuristics to complete the map** but these may cause the issue of either long compilation time (e.g., timeout after hours) or suboptimal compilation results in terms of running cycles (e.g., exponentially increasing number of total cycles). In this paper, we propose an efficient map** approach for Quantum Fourier Transformation (QFT) circuits over the existing IBM heavy-hex architecture. Such proposal first of all turns the architecture into a structure consisting of a straight line with dangling qubits, and then do the map** over this generated structure recursively. The calculation shows that there is a linear depth upper bound for the time complexity of these structures and for a special case where there is 1 dangling qubit in every 5 qubits, the time complexity is 5N+O(1). All these results are better than state of the art methods. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2207.05751 [pdf, other]

A Synergistic Compilation Workflow for Tackling Crosstalk in Quantum Machines

Authors: Fei Hua, Yuwei **, Ang Li, Chenxu Liu, Meng Wang, Yanhao Chen, Chi Zhang, Ari Hayes, Samuel Stein, Minghao Guo, Yipeng Huang, Eddy Z. Zhang

Abstract: Near-term quantum systems tend to be noisy. Crosstalk noise has been recognized as one of several major types of noises in superconducting Noisy Intermediate-Scale Quantum (NISQ) devices. Crosstalk arises from the concurrent execution of two-qubit gates on nearby qubits, such as \texttt{CX}. It might significantly raise the error rate of gates in comparison to running them individually. Crosstalk… ▽ More Near-term quantum systems tend to be noisy. Crosstalk noise has been recognized as one of several major types of noises in superconducting Noisy Intermediate-Scale Quantum (NISQ) devices. Crosstalk arises from the concurrent execution of two-qubit gates on nearby qubits, such as \texttt{CX}. It might significantly raise the error rate of gates in comparison to running them individually. Crosstalk can be mitigated through scheduling or hardware machine tuning. Prior scientific studies, however, manage crosstalk at a really late phase in the compilation process, usually after hardware map** is done. It may miss great opportunities of optimizing algorithm logic, routing, and crosstalk at the same time. In this paper, we push the envelope by considering all these factors simultaneously at the very early compilation stage. We propose a crosstalk-aware quantum program compilation framework called CQC that can enhance crosstalk mitigation while achieving satisfactory circuit depth. Moreover, we identify opportunities for translation from intermediate representation to the circuit for application-specific crosstalk mitigation, for instance, the \texttt{CX} ladder construction in variational quantum eigensolvers (VQE). Evaluations through simulation and on real IBM-Q devices show that our framework can significantly reduce the error rate by up to 6$\times$, with only $\sim$60\% circuit depth compared to state-of-the-art gate scheduling approaches. In particular, for VQE, we demonstrate 49\% circuit depth reduction with 9.6\% fidelity improvement over prior art on the H4 molecule using IBMQ Guadalupe. Our CQC framework will be released on GitHub. △ Less

Submitted 8 December, 2023; v1 submitted 12 July, 2022; originally announced July 2022.

arXiv:2011.13080 [pdf, other]

Photoacoustic Reconstruction Using Sparsity in Curvelet Frame: Image versus Data Domain

Authors: Bolin Pan, Simon R. Arridge, Felix Lucka, Ben T. Cox, Nam Huynh, Paul C. Beard, Edward Z. Zhang, Marta M. Betcke

Abstract: Curvelet frame is of special significance for photoacoustic tomography (PAT) due to its sparsifying and microlocalisation properties. We derive a one-to-one map between wavefront directions in image and data spaces in PAT which suggests near equivalence between the recovery of the initial pressure and PAT data from compressed/subsampled measurements when assuming sparsity in Curvelet frame. As the… ▽ More Curvelet frame is of special significance for photoacoustic tomography (PAT) due to its sparsifying and microlocalisation properties. We derive a one-to-one map between wavefront directions in image and data spaces in PAT which suggests near equivalence between the recovery of the initial pressure and PAT data from compressed/subsampled measurements when assuming sparsity in Curvelet frame. As the latter is computationally more tractable, investigation to which extent this equivalence holds conducted in this paper is of immediate practical significance. To this end we formulate and compare DR, a two step approach based on the recovery of the complete volume of the photoacoustic data from the subsampled data followed by the acoustic inversion, and p0R, a one step approach where the photoacoustic image (the initial pressure, p0) is directly recovered from the subsampled data. Effective representation of the photoacoustic data requires basis defined on the range of the photoacoustic forward operator. To this end we propose a novel wedge-restriction of Curvelet transform which enables us to construct such basis. Both recovery problems are formulated in a variational framework. As the Curvelet frame is heavily overdetermined, we use reweighted l1 norm penalties to enhance the sparsity of the solution. The data reconstruction problem DR is a standard compressed sensing recovery problem, which we solve using an ADMMtype algorithm, SALSA. Subsequently, the initial pressure is recovered using time reversal as implemented in the k-Wave Toolbox. The p0 reconstruction problem, p0R, aims to recover the photoacoustic image directly via FISTA, or ADMM when in addition including a non-negativity constraint. We compare and discuss the relative merits of the two approaches and illustrate them on 2D simulated and 3D real data in a fair and rigorous manner. △ Less

Submitted 6 August, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

Comments: 06 August 2021 (Accepted Version)

arXiv:2009.02346 [pdf, other]

SlackQ : Approaching the Qubit Map** Problem with A Slack-aware Swap Insertion Scheme

Authors: Chi Zhang, Yanhao Chen, Yuwei **, Wonsun Ahn, Youtao Zhang, Eddy Z. Zhang

Abstract: The rapid progress of physical implementation of quantum computers paved the way for the design of tools to help users write quantum programs for any given quantum device. The physical constraints inherent in current NISQ architectures prevent most quantum algorithms from being directly executed on quantum devices. To enable two-qubit gates in the algorithm, existing works focus on inserting SWAP… ▽ More The rapid progress of physical implementation of quantum computers paved the way for the design of tools to help users write quantum programs for any given quantum device. The physical constraints inherent in current NISQ architectures prevent most quantum algorithms from being directly executed on quantum devices. To enable two-qubit gates in the algorithm, existing works focus on inserting SWAP gates to dynamically remap logical qubits to physical qubits. However, their schemes lack consideration of the execution time of generated quantum circuits. In this work, we propose a slack-aware SWAP insertion scheme for the qubit map** problem in the NISQ era. Our experiments show performance improvement by up to 2.36X at maximum, by 1.62X on average, over 106 representative benchmarks from RevLib, IBM Qiskit , and ScaffCC. △ Less

Submitted 4 September, 2020; originally announced September 2020.

arXiv:2002.07289 [pdf, other]

A Depth-Aware Swap Insertion Scheme for the Qubit Map** Problem

Authors: Chi Zhang, Yanhao Chen, Yuwei **, Wonsun Ahn, Youtao Zhang, Eddy Z. Zhang

Abstract: The rapid progress of physical implementation of quantum computers paved the way of realising the design of tools to help users write quantum programs for any given quantum devices. The physical constraints inherent to the current NISQ architectures prevent most quantum algorithms from being directly executed on quantum devices. To enable two-qubit gates in the algorithm, existing works focus on i… ▽ More The rapid progress of physical implementation of quantum computers paved the way of realising the design of tools to help users write quantum programs for any given quantum devices. The physical constraints inherent to the current NISQ architectures prevent most quantum algorithms from being directly executed on quantum devices. To enable two-qubit gates in the algorithm, existing works focus on inserting SWAP gates to dynamically remap logical qubits to physical qubits. However, their schemes lack the consideration of the depth of generated quantum circuits. In this work, we propose a depth-aware SWAP insertion scheme for qubit map** problem in the NISQ era. △ Less

Submitted 17 February, 2020; originally announced February 2020.

arXiv:1906.06504 [pdf, other]

Accelerating Concurrent Heap on GPUs

Authors: Yanhao Chen, Fei Hua, Chaozhang Huang, Jeremy Bierema, Chi Zhang, Eddy Z. Zhang

Abstract: Priority queue, often implemented as a heap, is an abstract data type that has been used in many well-known applications like Dijkstra's shortest path algorithm, Prim's minimum spanning tree, Huffman encoding, and the branch-and-bound algorithm. However, it is challenging to exploit the parallelism of the heap on GPUs since the control divergence and memory irregularity must be taken into account.… ▽ More Priority queue, often implemented as a heap, is an abstract data type that has been used in many well-known applications like Dijkstra's shortest path algorithm, Prim's minimum spanning tree, Huffman encoding, and the branch-and-bound algorithm. However, it is challenging to exploit the parallelism of the heap on GPUs since the control divergence and memory irregularity must be taken into account. In this paper, we present a parallel generalized heap model that works effectively on GPUs. We also prove the linearizability of our generalized heap model which enables us to reason about the expected results. We evaluate our concurrent heap thoroughly and show a maximum 19.49X speedup compared to the sequential CPU implementation and 2.11X speedup compared with the existing GPU implementation. We also apply our heap to single source shortest path with up to 1.23X speedup and 0/1 knapsack problem with up to 12.19X speedup. △ Less

Submitted 15 June, 2019; originally announced June 2019.

arXiv:1605.02043 [pdf, other]

A Graph-based Model for GPU Caching Problems

Authors: Lingda Li, Ari B. Hayes, Stephen A. Hackler, Eddy Z. Zhang, Mario Szegedy, Shuaiwen Leon Song

Abstract: Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling among different threads. Traditionally, in the field of parallel computing, graph partition models are used to model data communication and guide task scheduling.… ▽ More Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling among different threads. Traditionally, in the field of parallel computing, graph partition models are used to model data communication and guide task scheduling. However, we discover that the previous methods are either inaccurate or expensive when applied to GPU programs. In this paper, we propose a novel task partition model that is accurate and gives rise to the development of fast and high quality task/data reorganization algorithms. We demonstrate the effectiveness of the proposed model by rigorous theoretical analysis of the algorithm bounds and extensive experimental analysis. The experimental results show that it achieves significant performance improvement across a representative set of GPU applications. △ Less

Submitted 6 May, 2016; originally announced May 2016.

Comments: Currently under submission

Showing 1–7 of 7 results for author: Zhang, E Z