Search | arXiv e-print repository

Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR

Authors: Yerbolat Khassanov, Zhipeng Chen, Tianfeng Chen, Tze Yuang Chong, Wei Li, Jun Zhang, Lu Lu, Yuxuan Wang

Abstract: This paper addresses challenges in integrating new languages into a pre-trained multilingual automatic speech recognition (mASR) system, particularly in scenarios where training data for existing languages is limited or unavailable. The proposed method employs a dual-pipeline with low-rank adaptation (LoRA). It maintains two data flow pipelines-one for existing languages and another for new langua… ▽ More This paper addresses challenges in integrating new languages into a pre-trained multilingual automatic speech recognition (mASR) system, particularly in scenarios where training data for existing languages is limited or unavailable. The proposed method employs a dual-pipeline with low-rank adaptation (LoRA). It maintains two data flow pipelines-one for existing languages and another for new languages. The primary pipeline follows the standard flow through the pre-trained parameters of mASR, while the secondary pipeline additionally utilizes language-specific parameters represented by LoRA and a separate output decoder module. Importantly, the proposed approach minimizes the performance degradation of existing languages and enables a language-agnostic operation mode, facilitated by a decoder selection strategy. We validate the effectiveness of the proposed method by extending the pre-trained Whisper model to 19 new languages from the FLEURS dataset △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 5 pages, 2 figures, 4 tables

arXiv:2405.00146 [pdf, other]

Averting multi-qubit burst errors in surface code magic state factories

Authors: Jason D. Chadwick, Christopher Kang, Joshua Viszlai, Sophia Fuhui Lin, Frederic T. Chong

Abstract: Fault-tolerant quantum computation relies on the assumption of time-invariant, sufficiently low physical error rates. However, current superconducting quantum computers suffer from frequent disruptive noise events, including cosmic ray impacts and shifting two-level system defects. Several methods have been proposed to mitigate these issues in software, but they add large overheads in terms of phy… ▽ More Fault-tolerant quantum computation relies on the assumption of time-invariant, sufficiently low physical error rates. However, current superconducting quantum computers suffer from frequent disruptive noise events, including cosmic ray impacts and shifting two-level system defects. Several methods have been proposed to mitigate these issues in software, but they add large overheads in terms of physical qubit count, as it is difficult to preserve logical information through burst error events. We focus on mitigating multi-qubit burst errors in magic state factories, which are expected to comprise up to 95% of the space cost of future quantum programs. Our key insight is that magic state factories do not need to preserve logical information over time; once we detect an increase in local physical error rates, we can simply turn off parts of the factory that are affected, re-map the factory to the new chip geometry, and continue operating. This is much more efficient than previous more general methods, and is resilient even under many simultaneous impact events. Using precise physical noise models, we show an efficient ray detection method and evaluate our strategy in different noise regimes. Compared to existing baselines, we find reductions in ray-induced overheads by several orders of magnitude, reducing total qubitcycle cost by geomean 6.5x to 13.9x depending on the noise model. This work reduces the burden on hardware by providing low-overhead software mitigation of these errors. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: 13 pages, 12 figures

arXiv:2404.17962 [pdf, other]

Deep Learning for Low-Latency, Quantum-Ready RF Sensing

Authors: Pranav Gokhale, Caitlin Carnahan, William Clark, Frederic T. Chong

Abstract: Recent work has shown the promise of applying deep learning to enhance software processing of radio frequency (RF) signals. In parallel, hardware developments with quantum RF sensors based on Rydberg atoms are breaking longstanding barriers in frequency range, resolution, and sensitivity. In this paper, we describe our implementations of quantum-ready machine learning approaches for RF signal clas… ▽ More Recent work has shown the promise of applying deep learning to enhance software processing of radio frequency (RF) signals. In parallel, hardware developments with quantum RF sensors based on Rydberg atoms are breaking longstanding barriers in frequency range, resolution, and sensitivity. In this paper, we describe our implementations of quantum-ready machine learning approaches for RF signal classification. Our primary objective is latency: while deep learning offers a more powerful computational paradigm, it also traditionally incurs latency overheads that hinder wider scale deployment. Our work spans three axes. (1) A novel continuous wavelet transform (CWT) based recurrent neural network (RNN) architecture that enables flexible online classification of RF signals on-the-fly with reduced sampling time. (2) Low-latency inference techniques for both GPU and CPU that span over 100x reductions in inference time, enabling real-time operation with sub-millisecond inference. (3) Quantum-readiness validated through application of our models to physics-based simulation of Rydberg atom QRF sensors. Altogether, our work bridges towards next-generation RF sensors that use quantum technology to surpass previous physical limits, paired with latency-optimized AI/ML software that is suitable for real-time deployment. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2401.05571 [pdf, other]

QuantumSEA: In-Time Sparse Exploration for Noise Adaptive Quantum Circuits

Authors: Tianlong Chen, Zhenyu Zhang, Hanrui Wang, Jiaqi Gu, Zirui Li, David Z. Pan, Frederic T. Chong, Song Han, Zhangyang Wang

Abstract: Parameterized Quantum Circuits (PQC) have obtained increasing popularity thanks to their great potential for near-term Noisy Intermediate-Scale Quantum (NISQ) computers. Achieving quantum advantages usually requires a large number of qubits and quantum circuits with enough capacity. However, limited coherence time and massive quantum noises severely constrain the size of quantum circuits that can… ▽ More Parameterized Quantum Circuits (PQC) have obtained increasing popularity thanks to their great potential for near-term Noisy Intermediate-Scale Quantum (NISQ) computers. Achieving quantum advantages usually requires a large number of qubits and quantum circuits with enough capacity. However, limited coherence time and massive quantum noises severely constrain the size of quantum circuits that can be executed reliably on real machines. To address these two pain points, we propose QuantumSEA, an in-time sparse exploration for noise-adaptive quantum circuits, aiming to achieve two key objectives: (1) implicit circuits capacity during training - by dynamically exploring the circuit's sparse connectivity and sticking a fixed small number of quantum gates throughout the training which satisfies the coherence time and enjoy light noises, enabling feasible executions on real quantum devices; (2) noise robustness - by jointly optimizing the topology and parameters of quantum circuits under real device noise models. In each update step of sparsity, we leverage the moving average of historical gradients to grow necessary gates and utilize salience-based pruning to eliminate insignificant gates. Extensive experiments are conducted with 7 Quantum Machine Learning (QML) and Variational Quantum Eigensolver (VQE) benchmarks on 6 simulated or real quantum computers, where QuantumSEA consistently surpasses noise-aware search, human-designed, and randomly generated quantum circuit baselines by a clear performance margin. For example, even in the most challenging on-chip training regime, our method establishes state-of-the-art results with only half the number of quantum gates and ~2x time saving of circuit executions. Codes are available at https://github.com/VITA-Group/QuantumSEA. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: IEEE International Conference on Quantum Computing and Engineering (QCE 2023)

arXiv:2401.05339 [pdf, other]

MicroGlam: Microscopic Skin Image Dataset with Cosmetics

Authors: Toby Chong, Alina Chadwick, I-chao Shen, Haoran Xie, Takeo Igarashi

Abstract: In this paper, we present a cosmetic-specific skin image dataset. It consists of skin images from $45$ patches ($5$ skin patches each from $9$ participants) of size $8mm^*8mm$ under three cosmetic products (i.e., foundation, blusher, and highlighter). We designed a novel capturing device inspired by Light Stage. Using the device, we captured over $600$ images of each skin patch under diverse light… ▽ More In this paper, we present a cosmetic-specific skin image dataset. It consists of skin images from $45$ patches ($5$ skin patches each from $9$ participants) of size $8mm^*8mm$ under three cosmetic products (i.e., foundation, blusher, and highlighter). We designed a novel capturing device inspired by Light Stage. Using the device, we captured over $600$ images of each skin patch under diverse lighting conditions in $30$ seconds. We repeated the process for the same skin patch under three cosmetic products. Finally, we demonstrate the viability of the dataset with an image-to-image translation-based pipeline for cosmetic rendering and compared our data-driven approach to an existing cosmetic rendering method. △ Less

Submitted 28 November, 2023; originally announced January 2024.

Comments: Project Page: https://github.com/tobyclh/MicroGlam

arXiv:2311.16214 [pdf, other]

DGR: Tackling Drifted and Correlated Noise in Quantum Error Correction via Decoding Graph Re-weighting

Authors: Hanrui Wang, Pengyu Liu, Yilian Liu, Jiaqi Gu, Jonathan Baker, Frederic T. Chong, Song Han

Abstract: Quantum hardware suffers from high error rates and noise, which makes directly running applications on them ineffective. Quantum Error Correction (QEC) is a critical technique towards fault tolerance which encodes the quantum information distributively in multiple data qubits and uses syndrome qubits to check parity. Minimum-Weight-Perfect-Matching (MWPM) is a popular QEC decoder that takes the sy… ▽ More Quantum hardware suffers from high error rates and noise, which makes directly running applications on them ineffective. Quantum Error Correction (QEC) is a critical technique towards fault tolerance which encodes the quantum information distributively in multiple data qubits and uses syndrome qubits to check parity. Minimum-Weight-Perfect-Matching (MWPM) is a popular QEC decoder that takes the syndromes as input and finds the matchings between syndromes that infer the errors. However, there are two paramount challenges for MWPM decoders. First, as noise in real quantum systems can drift over time, there is a potential misalignment with the decoding graph's initial weights, leading to a severe performance degradation in the logical error rates. Second, while the MWPM decoder addresses independent errors, it falls short when encountering correlated errors typical on real hardware, such as those in the 2Q depolarizing channel. We propose DGR, an efficient decoding graph edge re-weighting strategy with no quantum overhead. It leverages the insight that the statistics of matchings across decoding iterations offer rich information about errors on real quantum hardware. By counting the occurrences of edges and edge pairs in decoded matchings, we can statistically estimate the up-to-date probabilities of each edge and the correlations between them. The reweighting process includes two vital steps: alignment re-weighting and correlation re-weighting. The former updates the MWPM weights based on statistics to align with actual noise, and the latter adjusts the weight considering edge correlations. Extensive evaluations on surface code and honeycomb code under various settings show that DGR reduces the logical error rate by 3.6x on average-case noise mismatch with exceeding 5000x improvement under worst-case mismatch. △ Less

Submitted 22 April, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: 13 pages, 19 figures

arXiv:2311.16035 [pdf, other]

RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training

Authors: Hanrui Wang, Yilian Liu, Pengyu Liu, Jiaqi Gu, Zirui Li, Zhiding Liang, **glei Cheng, Yongshan Ding, Xuehai Qian, Yiyu Shi, David Z. Pan, Frederic T. Chong, Song Han

Abstract: Quantum state preparation, a crucial subroutine in quantum computing, involves generating a target quantum state from initialized qubits. Arbitrary state preparation algorithms can be broadly categorized into arithmetic decomposition (AD) and variational quantum state preparation (VQSP). AD employs a predefined procedure to decompose the target state into a series of gates, whereas VQSP iterativel… ▽ More Quantum state preparation, a crucial subroutine in quantum computing, involves generating a target quantum state from initialized qubits. Arbitrary state preparation algorithms can be broadly categorized into arithmetic decomposition (AD) and variational quantum state preparation (VQSP). AD employs a predefined procedure to decompose the target state into a series of gates, whereas VQSP iteratively tunes ansatz parameters to approximate target state. VQSP is particularly apt for Noisy-Intermediate Scale Quantum (NISQ) machines due to its shorter circuits. However, achieving noise-robust parameter optimization still remains challenging. We present RobustState, a novel VQSP training methodology that combines high robustness with high training efficiency. The core idea involves utilizing measurement outcomes from real machines to perform back-propagation through classical simulators, thus incorporating real quantum noise into gradient calculations. RobustState serves as a versatile, plug-and-play technique applicable for training parameters from scratch or fine-tuning existing parameters to enhance fidelity on target machines. It is adaptable to various ansatzes at both gate and pulse levels and can even benefit other variational algorithms, such as variational unitary synthesis. Comprehensive evaluation of RobustState on state preparation tasks for 4 distinct quantum algorithms using 10 real quantum machines demonstrates a coherent error reduction of up to 7.1 $\times$ and state fidelity improvement of up to 96\% and 81\% for 4-Q and 5-Q states, respectively. On average, RobustState improves fidelity by 50\% and 72\% for 4-Q and 5-Q states compared to baseline approaches. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Accepted to FASTML @ ICCAD 2023. 14 pages, 20 figures

arXiv:2307.14996 [pdf, other]

Decomposing and Routing Quantum Circuits Under Constraints for Neutral Atom Architectures

Authors: Natalia Nottingham, Michael A. Perlin, Ryan White, Hannes Bernien, Frederic T. Chong, Jonathan M. Baker

Abstract: Quantum computing is in an era defined by rapidly evolving quantum hardware technologies, combined with persisting high gate error rates, large amounts of noise, and short coherence times. Overcoming these limitations requires systems-level approaches that account for the strengths and weaknesses of the underlying hardware technology. Yet few hardware-aware compiler techniques exist for neutral at… ▽ More Quantum computing is in an era defined by rapidly evolving quantum hardware technologies, combined with persisting high gate error rates, large amounts of noise, and short coherence times. Overcoming these limitations requires systems-level approaches that account for the strengths and weaknesses of the underlying hardware technology. Yet few hardware-aware compiler techniques exist for neutral atom devices, with no prior work on compiling to the neutral atom native gate set. In particular, current neutral atom hardware does not support certain single-qubit rotations via local addressing, which often requires the circuit to be decomposed into a large number of gates, leading to long circuit durations and low overall fidelities. We propose the first compiler designed to overcome the challenges of limited local addressibility in neutral atom quantum computers. We present algorithms to decompose circuits into the neutral atom native gate set, with emphasis on optimizing total pulse area of global gates, which dominate gate execution costs in several current architectures. Furthermore, we explore atom movement as an alternative to expensive gate decompositions, gaining immense speedup with routing, which remains a huge overhead for many quantum circuits. Our decomposition optimizations result in up to ~3.5x and ~2.9x speedup in time spent executing global gates and time spent executing single-qubit gates, respectively. When combined with our atom movement routing algorithms, our compiler achieves up to ~10x reduction in circuit duration, with over ~2x improvement in fidelity. We show that our compiler strategies can be adapted for a variety of hardware-level parameters as neutral atom technology continues to develop. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 13 pages, 12 figures

arXiv:2307.14459 [pdf, other]

Training Quantum Boltzmann Machines with Coresets

Authors: Joshua Viszlai, Teague Tomesh, Pranav Gokhale, Eric Anschuetz, Frederic T. Chong

Abstract: Recent work has proposed and explored using coreset techniques for quantum algorithms that operate on classical data sets to accelerate the applicability of these algorithms on near-term quantum devices. We apply these ideas to Quantum Boltzmann Machines (QBM) where gradient-based steps which require Gibbs state sampling are the main computational bottleneck during training. By using a coreset in… ▽ More Recent work has proposed and explored using coreset techniques for quantum algorithms that operate on classical data sets to accelerate the applicability of these algorithms on near-term quantum devices. We apply these ideas to Quantum Boltzmann Machines (QBM) where gradient-based steps which require Gibbs state sampling are the main computational bottleneck during training. By using a coreset in place of the full data set, we try to minimize the number of steps needed and accelerate the overall training time. In a regime where computational time on quantum computers is a precious resource, we propose this might lead to substantial practical savings. We evaluate this approach on 6x6 binary images from an augmented bars and stripes data set using a QBM with 36 visible units and 8 hidden units. Using an Inception score inspired metric, we compare QBM training times with and without using coresets. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: Appeared in IEEE International Conference on Quantum Computing and Engineering (QCE22) in September 2022

arXiv:2307.13460 [pdf, other]

Fundamental causal bounds of quantum random access memories

Authors: Yunfei Wang, Yuri Alexeev, Liang Jiang, Frederic T. Chong, Junyu Liu

Abstract: Quantum devices should operate in adherence to quantum physics principles. Quantum random access memory (QRAM), a fundamental component of many essential quantum algorithms for tasks such as linear algebra, data search, and machine learning, is often proposed to offer $\mathcal{O}(\log N)$ circuit depth for $\mathcal{O}(N)$ data size, given $N$ qubits. However, this claim appears to breach the pri… ▽ More Quantum devices should operate in adherence to quantum physics principles. Quantum random access memory (QRAM), a fundamental component of many essential quantum algorithms for tasks such as linear algebra, data search, and machine learning, is often proposed to offer $\mathcal{O}(\log N)$ circuit depth for $\mathcal{O}(N)$ data size, given $N$ qubits. However, this claim appears to breach the principle of relativity when dealing with a large number of qubits in quantum materials interacting locally. In our study we critically explore the intrinsic bounds of rapid quantum memories based on causality, employing the relativistic quantum field theory and Lieb-Robinson bounds in quantum many-body systems. In this paper, we consider a hardware-efficient QRAM design in hybrid quantum acoustic systems. Assuming clock cycle times of approximately $10^{-3}$ seconds and a lattice spacing of about 1 micrometer, we show that QRAM can accommodate up to $\mathcal{O}(10^7)$ logical qubits in 1 dimension, $\mathcal{O}(10^{15})$ to $\mathcal{O}(10^{20})$ in various 2D architectures, and $\mathcal{O}(10^{24})$ in 3 dimensions. We contend that this causality bound broadly applies to other quantum hardware systems. Our findings highlight the impact of fundamental quantum physics constraints on the long-term performance of quantum computing applications in data science and suggest potential quantum memory designs for performance enhancement. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: 8+24=32 pages, many figures

arXiv:2306.15020 [pdf, other]

Clifford Assisted Optimal Pass Selection for Quantum Transpilation

Authors: Siddharth Dangwal, Gokul Subramanian Ravi, Lennart Maximilian Seifert, Frederic T. Chong

Abstract: The fidelity of quantum programs in the NISQ era is limited by high levels of device noise. To increase the fidelity of quantum programs running on NISQ devices, a variety of optimizations have been proposed. These include map** passes, routing passes, scheduling methods and standalone optimisations which are usually incorporated into a transpiler as passes. Popular transpilers such as those pro… ▽ More The fidelity of quantum programs in the NISQ era is limited by high levels of device noise. To increase the fidelity of quantum programs running on NISQ devices, a variety of optimizations have been proposed. These include map** passes, routing passes, scheduling methods and standalone optimisations which are usually incorporated into a transpiler as passes. Popular transpilers such as those proposed by Qiskit, Cirq and Cambridge Quantum Computing make use of these extensively. However, choosing the right set of transpiler passes and the right configuration for each pass is a challenging problem. Transpilers often make critical decisions using heuristics since the ideal choices are impossible to identify without knowing the target application outcome. Further, the transpiler also makes simplifying assumptions about device noise that often do not hold in the real world. As a result, we often see effects where the fidelity of a target application decreases despite using state-of-the-art optimisations. To overcome this challenge, we propose OPTRAN, a framework for Choosing an Optimal Pass Set for Quantum Transpilation. OPTRAN uses classically simulable quantum circuits composed entirely of Clifford gates, that resemble the target application, to estimate how different passes interact with each other in the context of the target application. OPTRAN then uses this information to choose the optimal combination of passes that maximizes the target application's fidelity when run on the actual device. Our experiments on IBM machines show that OPTRAN improves fidelity by 87.66% of the maximum possible limit over the baseline used by IBM Qiskit. We also propose low-cost variants of OPTRAN, called OPTRAN-E-3 and OPTRAN-E-1 that improve fidelity by 78.33% and 76.66% of the maximum permissible limit over the baseline at a 58.33% and 69.44% reduction in cost compared to OPTRAN respectively. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.06027 [pdf, other]

VarSaw: Application-tailored Measurement Error Mitigation for Variational Quantum Algorithms

Authors: Siddharth Dangwal, Gokul Subramanian Ravi, Poulami Das, Kaitlin N. Smith, Jonathan M. Baker, Frederic T. Chong

Abstract: For potential quantum advantage, Variational Quantum Algorithms (VQAs) need high accuracy beyond the capability of today's NISQ devices, and thus will benefit from error mitigation. In this work we are interested in mitigating measurement errors which occur during qubit measurements after circuit execution and tend to be the most error-prone operations, especially detrimental to VQAs. Prior work,… ▽ More For potential quantum advantage, Variational Quantum Algorithms (VQAs) need high accuracy beyond the capability of today's NISQ devices, and thus will benefit from error mitigation. In this work we are interested in mitigating measurement errors which occur during qubit measurements after circuit execution and tend to be the most error-prone operations, especially detrimental to VQAs. Prior work, JigSaw, has shown that measuring only small subsets of circuit qubits at a time and collecting results across all such subset circuits can reduce measurement errors. Then, running the entire (global) original circuit and extracting the qubit-qubit measurement correlations can be used in conjunction with the subsets to construct a high-fidelity output distribution of the original circuit. Unfortunately, the execution cost of JigSaw scales polynomially in the number of qubits in the circuit, and when compounded by the number of circuits and iterations in VQAs, the resulting execution cost quickly turns insurmountable. To combat this, we propose VarSaw, which improves JigSaw in an application-tailored manner, by identifying considerable redundancy in the JigSaw approach for VQAs: spatial redundancy across subsets from different VQA circuits and temporal redundancy across globals from different VQA iterations. VarSaw then eliminates these forms of redundancy by commuting the subset circuits and selectively executing the global circuits, reducing computational cost (in terms of the number of circuits executed) over naive JigSaw for VQA by 25x on average and up to 1000x, for the same VQA accuracy. Further, it can recover, on average, 45% of the infidelity from measurement errors in the noisy VQA baseline. Finally, it improves fidelity by 55%, on average, over JigSaw for a fixed computational budget. VarSaw can be accessed here: https://github.com/siddharthdangwal/VarSaw. △ Less

Submitted 29 February, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: Appears at the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2024. First two authors contributed equally

arXiv:2305.03243 [pdf, other]

Microarchitectures for Heterogeneous Superconducting Quantum Computers

Authors: Samuel Stein, Sara Sussman, Teague Tomesh, Charles Guinn, Esin Tureci, Sophia Fuhui Lin, Wei Tang, James Ang, Srivatsan Chakram, Ang Li, Margaret Martonosi, Fred T. Chong, Andrew A. Houck, Isaac L. Chuang, Michael Austin DeMarco

Abstract: Noisy Intermediate-Scale Quantum Computing (NISQ) has dominated headlines in recent years, with the longer-term vision of Fault-Tolerant Quantum Computation (FTQC) offering significant potential albeit at currently intractable resource costs and quantum error correction (QEC) overheads. For problems of interest, FTQC will require millions of physical qubits with long coherence times, high-fidelity… ▽ More Noisy Intermediate-Scale Quantum Computing (NISQ) has dominated headlines in recent years, with the longer-term vision of Fault-Tolerant Quantum Computation (FTQC) offering significant potential albeit at currently intractable resource costs and quantum error correction (QEC) overheads. For problems of interest, FTQC will require millions of physical qubits with long coherence times, high-fidelity gates, and compact sizes to surpass classical systems. Just as heterogeneous specialization has offered scaling benefits in classical computing, it is likewise gaining interest in FTQC. However, systematic use of heterogeneity in either hardware or software elements of FTQC systems remains a serious challenge due to the vast design space and variable physical constraints. This paper meets the challenge of making heterogeneous FTQC design practical by introducing HetArch, a toolbox for designing heterogeneous quantum systems, and using it to explore heterogeneous design scenarios. Using a hierarchical approach, we successively break quantum algorithms into smaller operations (akin to classical application kernels), thus greatly simplifying the design space and resulting tradeoffs. Specializing to superconducting systems, we then design optimized heterogeneous hardware composed of varied superconducting devices, abstracting physical constraints into design rules that enable devices to be assembled into standard cells optimized for specific operations. Finally, we provide a heterogeneous design space exploration framework which reduces the simulation burden by a factor of 10^4 or more and allows us to characterize optimal design points. We use these techniques to design superconducting quantum modules for entanglement distillation, error correction, and code teleportation, reducing error rates by 2.6x, 10.7x, and 3.0x compared to homogeneous systems. △ Less

Submitted 4 May, 2023; originally announced May 2023.

arXiv:2303.14069 [pdf, other]

doi 10.1145/3579371.3589106

Dancing the Quantum Waltz: Compiling Three-Qubit Gates on Four Level Architectures

Authors: Andrew Litteken, Lennart Maximilian Seifert, Jason D. Chadwick, Natalia Nottingham, Tanay Roy, Ziqian Li, David Schuster, Frederic T. Chong, Jonathan M. Baker

Abstract: Superconducting quantum devices are a leading technology for quantum computation, but they suffer from several challenges. Gate errors, coherence errors and a lack of connectivity all contribute to low fidelity results. In particular, connectivity restrictions enforce a gate set that requires three-qubit gates to be decomposed into one- or two-qubit gates. This substantially increases the number o… ▽ More Superconducting quantum devices are a leading technology for quantum computation, but they suffer from several challenges. Gate errors, coherence errors and a lack of connectivity all contribute to low fidelity results. In particular, connectivity restrictions enforce a gate set that requires three-qubit gates to be decomposed into one- or two-qubit gates. This substantially increases the number of two-qubit gates that need to be executed. However, many quantum devices have access to higher energy levels. We can expand the qubit abstraction of $|0\rangle$ and $|1\rangle$ to a ququart which has access to the $|2\rangle$ and $|3\rangle$ state, but with shorter coherence times. This allows for two qubits to be encoded in one ququart, enabling increased virtual connectivity between physical units from two adjacent qubits to four fully connected qubits. This connectivity scheme allows us to more efficiently execute three-qubit gates natively between two physical devices. We present direct-to-pulse implementations of several three-qubit gates, synthesized via optimal control, for compilation of three-qubit gates onto a superconducting-based architecture with access to four-level devices with the first experimental demonstration of four-level ququart gates designed through optimal control. We demonstrate strategies that temporarily use higher level states to perform Toffoli gates and always use higher level states to improve fidelities for quantum circuits. We find that these methods improve expected fidelities with increases of 2x across circuit sizes using intermediate encoding, and increases of 3x for fully-encoded ququart compilation. △ Less

Submitted 27 February, 2024; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: 14 pages, 9 figures, to be published at ISCA 2023

arXiv:2303.10546 [pdf, other]

Supporting Piggybacked Co-Located Leisure Activities via Augmented Reality

Authors: Samantha Reig, Erica Principe Cruz, Melissa M. Powers, Jennifer He, Timothy Chong, Yu Jiang Tham, Sven Kratz, Ava Robinson, Brian A. Smith, Rajan Vaish, Andrés Monroy-Hernández

Abstract: Technology, especially the smartphone, is villainized for taking meaning and time away from in-person interactions and secluding people into "digital bubbles". We believe this is not an intrinsic property of digital gadgets, but evidence of a lack of imagination in technology design. Leveraging augmented reality (AR) toward this end allows us to create experiences for multiple people, their pets,… ▽ More Technology, especially the smartphone, is villainized for taking meaning and time away from in-person interactions and secluding people into "digital bubbles". We believe this is not an intrinsic property of digital gadgets, but evidence of a lack of imagination in technology design. Leveraging augmented reality (AR) toward this end allows us to create experiences for multiple people, their pets, and their environments. In this work, we explore the design of AR technology that "piggybacks" on everyday leisure to foster co-located interactions among close ties (with other people and pets. We designed, developed, and deployed three such AR applications, and evaluated them through a 41-participant and 19-pet user study. We gained key insights about the ability of AR to spur and enrich interaction in new channels, the importance of customization, and the challenges of designing for the physical aspects of AR devices (e.g., holding smartphones). These insights guide design implications for the novel research space of co-located AR. △ Less

Submitted 18 March, 2023; originally announced March 2023.

arXiv:2303.02131 [pdf, other]

doi 10.22331/q-2024-02-15-1257

Spacetime-Efficient Low-Depth Quantum State Preparation with Applications

Authors: Kaiwen Gui, Alexander M. Dalzell, Alessandro Achille, Martin Suchara, Frederic T. Chong

Abstract: We propose a novel deterministic method for preparing arbitrary quantum states. When our protocol is compiled into CNOT and arbitrary single-qubit gates, it prepares an $N$-dimensional state in depth $O(\log(N))$ and spacetime allocation (a metric that accounts for the fact that oftentimes some ancilla qubits need not be active for the entire circuit) $O(N)$, which are both optimal. When compiled… ▽ More We propose a novel deterministic method for preparing arbitrary quantum states. When our protocol is compiled into CNOT and arbitrary single-qubit gates, it prepares an $N$-dimensional state in depth $O(\log(N))$ and spacetime allocation (a metric that accounts for the fact that oftentimes some ancilla qubits need not be active for the entire circuit) $O(N)$, which are both optimal. When compiled into the $\{\mathrm{H,S,T,CNOT}\}$ gate set, we show that it requires asymptotically fewer quantum resources than previous methods. Specifically, it prepares an arbitrary state up to error $ε$ with optimal depth of $O(\log(N) + \log (1/ε))$ and spacetime allocation $O(N\log(\log(N)/ε))$, improving over $O(\log(N)\log(\log (N)/ε))$ and $O(N\log(N/ε))$, respectively. We illustrate how the reduced spacetime allocation of our protocol enables rapid preparation of many disjoint states with only constant-factor ancilla overhead -- $O(N)$ ancilla qubits are reused efficiently to prepare a product state of $w$ $N$-dimensional states in depth $O(w + \log(N))$ rather than $O(w\log(N))$, achieving effectively constant depth per state. We highlight several applications where this ability would be useful, including quantum machine learning, Hamiltonian simulation, and solving linear systems of equations. We provide quantum circuit descriptions of our protocol, detailed pseudocode, and gate-level implementation examples using Braket. △ Less

Submitted 9 February, 2024; v1 submitted 3 March, 2023; originally announced March 2023.

Journal ref: Quantum 8, 1257 (2024)

arXiv:2303.00658 [pdf, other]

doi 10.1145/3575693.3575726

Qompress: Efficient Compilation for Ququarts Exploiting Partial and Mixed Radix Operations for Communication Reduction

Authors: Andrew Litteken, Lennart Maximilian Seifert, Jason Chadwick, Natalia Nottingham, Fredric T. Chong, Jonathan M. Baker

Abstract: Quantum computing is in an era of limited resources. Current hardware lacks high fidelity gates, long coherence times, and the number of computational units required to perform meaningful computation. Contemporary quantum devices typically use a binary system, where each qubit exists in a superposition of the $\ket{0}$ and $\ket{1}$ states. However, it is often possible to access the $\ket{2}$ or… ▽ More Quantum computing is in an era of limited resources. Current hardware lacks high fidelity gates, long coherence times, and the number of computational units required to perform meaningful computation. Contemporary quantum devices typically use a binary system, where each qubit exists in a superposition of the $\ket{0}$ and $\ket{1}$ states. However, it is often possible to access the $\ket{2}$ or even $\ket{3}$ states in the same physical unit by manipulating the system in different ways. In this work, we consider automatically encoding two qubits into one four-state qu\emph{quart} via a \emph{compression scheme}. We use quantum optimal control to design efficient proof-of-concept gates that fully replicate standard qubit computation on these encoded qubits. We extend qubit compilation schemes to efficiently route qubits on an arbitrary mixed-radix system consisting of both qubits and ququarts, reducing communication and minimizing excess circuit execution time introduced by longer-duration ququart gates. In conjunction with these compilation strategies, we introduce several methods to find beneficial compressions, reducing circuit error due to computation and communication by up to 50\%. These methods can increase the computational space available on a limited near-term machine by up to 2x while maintaining circuit fidelity. △ Less

Submitted 2 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: 14 pages, 13 figures, 1 table, to be published at ASPLOS 2023

Journal ref: ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, January 2023, Pages 646-659

arXiv:2302.02003 [pdf, other]

doi 10.1109/ISCAS46773.2023.10181370

QContext: Context-Aware Decomposition for Quantum Gates

Authors: Ji Liu, Max Bowman, Pranav Gokhale, Siddharth Dangwal, Jeffrey Larson, Frederic T. Chong, Paul D. Hovland

Abstract: In this paper we propose QContext, a new compiler structure that incorporates context-aware and topology-aware decompositions. Because of circuit equivalence rules and resynthesis, variants of a gate-decomposition template may exist. QContext exploits the circuit information and the hardware topology to select the gate variant that increases circuit optimization opportunities. We study the basis-g… ▽ More In this paper we propose QContext, a new compiler structure that incorporates context-aware and topology-aware decompositions. Because of circuit equivalence rules and resynthesis, variants of a gate-decomposition template may exist. QContext exploits the circuit information and the hardware topology to select the gate variant that increases circuit optimization opportunities. We study the basis-gate-level context-aware decomposition for Toffoli gates and the native-gate-level context-aware decomposition for CNOT gates. Our experiments show that QContext reduces the number of gates as compared with the state-of-the-art approach, Orchestrated Trios. △ Less

Submitted 3 February, 2023; originally announced February 2023.

Comments: 10 pages

arXiv:2302.01553 [pdf, other]

doi 10.1109/QCE57702.2023.00145

Efficient control pulses for continuous quantum gate families through coordinated re-optimization

Authors: Jason D. Chadwick, Frederic T. Chong

Abstract: We present a general method to quickly generate high-fidelity control pulses for any continuously-parameterized set of quantum gates after calibrating a small number of reference pulses. We find that interpolating between optimized control pulses for different quantum operations does not immediately yield a high-fidelity intermediate operation. To solve this problem, we propose a method to optimiz… ▽ More We present a general method to quickly generate high-fidelity control pulses for any continuously-parameterized set of quantum gates after calibrating a small number of reference pulses. We find that interpolating between optimized control pulses for different quantum operations does not immediately yield a high-fidelity intermediate operation. To solve this problem, we propose a method to optimize control pulses specifically to provide good interpolations. We pick several reference operations in the gate family of interest and optimize pulses that implement these operations, then iteratively re-optimize the pulses to guide their shapes to be similar for operations that are closely related. Once this set of reference pulses is calibrated, we can use a straightforward linear interpolation method to instantly obtain high-fidelity pulses for arbitrary gates in the continuous operation space. We demonstrate this procedure on the three-parameter Cartan decomposition of two-qubit gates to obtain control pulses for any arbitrary two-qubit gate (up to single-qubit operations) with consistently high fidelity. Compared to previous neural network approaches, the method is 7.7x more computationally efficient to calibrate the pulse space for the set of all single-qubit gates. Our technique generalizes to any number of gate parameters and could easily be used with advanced pulse optimization algorithms to allow for better translation from simulation to experiment. △ Less

Submitted 31 July, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: 9 pages, 6 figures, 2 tables; appearing in QCE 2023

arXiv:2211.16469 [pdf, other]

doi 10.1109/ISMVL52857.2022.00014

Communication Trade Offs in Intermediate Qudit Circuits

Authors: Andrew Litteken, Jonathan M. Baker, Frederic T. Chong

Abstract: Quantum computing promises speedup of classical algorithms in the long term. Current hardware is unable to support this goal and programs must be efficiently compiled to use of the devices through reduction of qubits used, gate count and circuit duration. Many quantum systems have access to higher levels, expanding the computational space for a device. We develop higher level qudit communication… ▽ More Quantum computing promises speedup of classical algorithms in the long term. Current hardware is unable to support this goal and programs must be efficiently compiled to use of the devices through reduction of qubits used, gate count and circuit duration. Many quantum systems have access to higher levels, expanding the computational space for a device. We develop higher level qudit communication circuits, compilation pipelines, and circuits that take advantage of this extra space by temporarily pushing qudits into these higher levels. We show how these methods are able to more efficiently use the device, and where they see diminishing returns. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: 7 pages, 9 Figures, In ISVML22: 2022 IEEE 52nd International Symposium on Multiple-Valued Logic

arXiv:2211.15757 [pdf, other]

Reducing Runtime Overhead via Use-Based Migration in Neutral Atom Quantum Architectures

Authors: Andrew Litteken, Jonathan M. Baker, Frederic T. Chong

Abstract: Neutral atoms are a promising choice for scalable quantum computing architectures. Features such as long distance interactions and native multiqubit gates offer reductions in communication costs and operation count. However, the trapped atoms used as qubits can be lost over the course of computation and due to adverse environmental factors. The value of a lost computation qubit cannot be recovered… ▽ More Neutral atoms are a promising choice for scalable quantum computing architectures. Features such as long distance interactions and native multiqubit gates offer reductions in communication costs and operation count. However, the trapped atoms used as qubits can be lost over the course of computation and due to adverse environmental factors. The value of a lost computation qubit cannot be recovered and requires the reloading of the array and rerunning of the computation, greatly increasing the number of runs of a circuit. Software mitigation strategies exist but exhaust the original mapped locations of the circuit slowly and create more spread out clusters of qubits across the architecture decreasing the probability of success. We increase flexibility by develo** strategies that find all reachable qubits, rather only adjacent hardware qubits. Second, we divide the architecture into separate sections, and run the circuit in each section, free of lost atoms. Provided the architecture is large enough, this resets the circuit without having to reload the entire architecture. This increases the number of effective shots before reloading by a factor of two for a circuit that utilizes 30% of the architecture. We also explore using these sections to parallelize execution of circuits, reducing the overall runtime by a total 50% for 30 qubit circuit. These techniques contribute to a dynamic new set of strategies to combat the detrimental effects of lost computational space. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Comments: 11 pages, 11 Figures, In QCE22: 2022 IEEE International Conference on Quantum Computing & Engineering

arXiv:2211.12711 [pdf, other]

doi 10.1109/QCE57702.2023.00034

SnCQA: A hardware-efficient equivariant quantum convolutional circuit architecture

Authors: Han Zheng, Christopher Kang, Gokul Subramanian Ravi, Hanrui Wang, Kanav Setia, Frederic T. Chong, Junyu Liu

Abstract: We propose SnCQA, a set of hardware-efficient variational circuits of equivariant quantum convolutional circuits respective to permutation symmetries and spatial lattice symmetries with the number of qubits $n$. By exploiting permutation symmetries of the system, such as lattice Hamiltonians common to many quantum many-body and quantum chemistry problems, Our quantum neural networks are suitable f… ▽ More We propose SnCQA, a set of hardware-efficient variational circuits of equivariant quantum convolutional circuits respective to permutation symmetries and spatial lattice symmetries with the number of qubits $n$. By exploiting permutation symmetries of the system, such as lattice Hamiltonians common to many quantum many-body and quantum chemistry problems, Our quantum neural networks are suitable for solving machine learning problems where permutation symmetries are present, which could lead to significant savings of computational costs. Aside from its theoretical novelty, we find our simulations perform well in practical instances of learning ground states in quantum computational chemistry, where we could achieve comparable performances to traditional methods with few tens of parameters. Compared to other traditional variational quantum circuits, such as the pure hardware-efficient ansatz (pHEA), we show that SnCQA is more scalable, accurate, and noise resilient (with $20\times$ better performance on $3 \times 4$ square lattice and $200\% - 1000\%$ resource savings in various lattice sizes and key criterions such as the number of layers, parameters, and times to converge in our cases), suggesting a potentially favorable experiment on near-time quantum devices. △ Less

Submitted 22 September, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

Comments: 10 pages, many figures. IEEE QCE 2023, 1st best paper award in quantum algorithms

Journal ref: 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), 2023, pp. 236-245

arXiv:2210.16724 [pdf, other]

doi 10.1145/3508352.3561118

QuEst: Graph Transformer for Quantum Circuit Reliability Estimation

Authors: Hanrui Wang, Pengyu Liu, **glei Cheng, Zhiding Liang, Jiaqi Gu, Zirui Li, Yongshan Ding, Weiwen Jiang, Yiyu Shi, Xuehai Qian, David Z. Pan, Frederic T. Chong, Song Han

Abstract: Among different quantum algorithms, PQC for QML show promises on near-term devices. To facilitate the QML and PQC research, a recent python library called TorchQuantum has been released. It can construct, simulate, and train PQC for machine learning tasks with high speed and convenient debugging supports. Besides quantum for ML, we want to raise the community's attention on the reversed direction:… ▽ More Among different quantum algorithms, PQC for QML show promises on near-term devices. To facilitate the QML and PQC research, a recent python library called TorchQuantum has been released. It can construct, simulate, and train PQC for machine learning tasks with high speed and convenient debugging supports. Besides quantum for ML, we want to raise the community's attention on the reversed direction: ML for quantum. Specifically, the TorchQuantum library also supports using data-driven ML models to solve problems in quantum system research, such as predicting the impact of quantum noise on circuit fidelity and improving the quantum circuit compilation efficiency. This paper presents a case study of the ML for quantum part. Since estimating the noise impact on circuit reliability is an essential step toward understanding and mitigating noise, we propose to leverage classical ML to predict noise impact on circuit fidelity. Inspired by the natural graph representation of quantum circuits, we propose to leverage a graph transformer model to predict the noisy circuit fidelity. We firstly collect a large dataset with a variety of quantum circuits and obtain their fidelity on noisy simulators and real machines. Then we embed each circuit into a graph with gate and noise properties as node features, and adopt a graph transformer to predict the fidelity. Evaluated on 5 thousand random and algorithm circuits, the graph transformer predictor can provide accurate fidelity estimation with RMSE error 0.04 and outperform a simple neural network-based model by 0.02 on average. It can achieve 0.99 and 0.95 R$^2$ scores for random and algorithm circuits, respectively. Compared with circuit simulators, the predictor has over 200X speedup for estimating the fidelity. △ Less

Submitted 29 October, 2022; originally announced October 2022.

Comments: ICCAD 2022; 10 pages, 10 figures; code at https://github.com/mit-han-lab/torchquantum

arXiv:2210.15876 [pdf, ps, other]

Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition

Authors: Yist Y. Lin, Tao Han, Haihua Xu, Van Tung Pham, Yerbolat Khassanov, Tze Yuang Chong, Yi He, Lu Lu, Zejun Ma

Abstract: One of limitations in end-to-end automatic speech recognition (ASR) framework is its performance would be compromised if train-test utterance lengths are mismatched. In this paper, we propose an on-the-fly random utterance concatenation (RUC) based data augmentation method to alleviate train-test utterance length mismatch issue for short-video ASR task. Specifically, we are motivated by observatio… ▽ More One of limitations in end-to-end automatic speech recognition (ASR) framework is its performance would be compromised if train-test utterance lengths are mismatched. In this paper, we propose an on-the-fly random utterance concatenation (RUC) based data augmentation method to alleviate train-test utterance length mismatch issue for short-video ASR task. Specifically, we are motivated by observations that our human-transcribed training utterances tend to be much shorter for short-video spontaneous speech (~3 seconds on average), while our test utterance generated from voice activity detection front-end is much longer (~10 seconds on average). Such a mismatch can lead to suboptimal performance. Empirically, it's observed the proposed RUC method significantly improves long utterance recognition without performance drop on short one. Overall, it achieves 5.72% word error rate reduction on average for 15 languages and improved robustness to various utterance length. △ Less

Submitted 25 May, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: 5 pages, 3 figures, 4 tables

arXiv:2209.12280 [pdf, other]

Navigating the dynamic noise landscape of variational quantum algorithms with QISMET

Authors: Gokul Subramanian Ravi, Kaitlin N. Smith, Jonathan M. Baker, Tejas Kannan, Nathan Earnest, Ali Javadi-Abhari, Henry Hoffmann, Frederic T. Chong

Abstract: Transient errors from the dynamic NISQ noise landscape are challenging to comprehend and are especially detrimental to classes of applications that are iterative and/or long-running, and therefore their timely mitigation is important for quantum advantage in real-world applications. The most popular examples of iterative long-running quantum applications are variational quantum algorithms (VQAs).… ▽ More Transient errors from the dynamic NISQ noise landscape are challenging to comprehend and are especially detrimental to classes of applications that are iterative and/or long-running, and therefore their timely mitigation is important for quantum advantage in real-world applications. The most popular examples of iterative long-running quantum applications are variational quantum algorithms (VQAs). Iteratively, VQA's classical optimizer evaluates circuit candidates on an objective function and picks the best circuits towards achieving the application's target. Noise fluctuation can cause a significant transient impact on the objective function estimation of the VQA iterations / tuning candidates. This can severely affect VQA tuning and, by extension, its accuracy and convergence. This paper proposes QISMET: Quantum Iteration Skip** to Mitigate Error Transients, to navigate the dynamic noise landscape of VQAs. QISMET actively avoids instances of high fluctuating noise which are predicted to have a significant transient error impact on specific VQA iterations. To achieve this, QISMET estimates transient error in VQA iterations and designs a controller to keep the VQA tuning faithful to the transient-free scenario. By doing so, QISMET efficiently mitigates a large portion of the transient noise impact on VQAs and is able to improve the fidelity by 1.3x-3x over a traditional VQA baseline, with 1.6-2.4x improvement over alternative approaches, across different applications and machines. Further, to diligently analyze the effects of transients, this work also builds transient noise models for target VQA applications from observing real machine transients. These are then integrated with the Qiskit simulator. △ Less

Submitted 29 September, 2023; v1 submitted 25 September, 2022; originally announced September 2022.

Comments: Appears at the 28th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2023)

arXiv:2208.08547 [pdf, other]

Better Than Worst-Case Decoding for Quantum Error Correction

Authors: Gokul Subramanian Ravi, Jonathan M. Baker, Arash Fayyazi, Sophia Fuhui Lin, Ali Javadi-Abhari, Massoud Pedram, Frederic T. Chong

Abstract: The overheads of classical decoding for quantum error correction on superconducting quantum systems grow rapidly with the number of logical qubits and their correction code distance. Decoding at room temperature is bottle-necked by refrigerator I/O bandwidth while cryogenic on-chip decoding is limited by area/power/thermal budget. To overcome these overheads, we are motivated by the observation… ▽ More The overheads of classical decoding for quantum error correction on superconducting quantum systems grow rapidly with the number of logical qubits and their correction code distance. Decoding at room temperature is bottle-necked by refrigerator I/O bandwidth while cryogenic on-chip decoding is limited by area/power/thermal budget. To overcome these overheads, we are motivated by the observation that in the common case, error signatures are fairly trivial with high redundancy/sparsity, since the error correction codes are over-provisioned to correct for uncommon worst-case complex scenarios (to ensure substantially low logical error rates). If suitably exploited, these trivial signatures can be decoded and corrected with insignificant overhead, thereby alleviating the bottlenecks described above, while still handling the worst-case complex signatures by state-of-the-art means. Our proposal, targeting Surface Codes, consists of: 1) Clique: A lightweight decoder for decoding and correcting trivial common-case errors, designed for the cryogenic domain. The decoder is implemented for SFQ logic. 2) A statistical confidence-based technique for off-chip decoding bandwidth allocation, to efficiently handle rare complex decodes which are not covered by the on-chip decoder. 3) A method for stalling circuit execution, for the worst-case scenarios in which the provisioned off-chip bandwidth is insufficient to complete all requested off-chip decodes. In all, our proposal enables 70-99+% off-chip bandwidth elimination across a range of logical and physical error rates, without significantly sacrificing the accuracy of state-of-the-art off-chip decoding. By doing so, it achieves 10-10000x bandwidth reduction over prior off-chip bandwidth reduction techniques. Furthermore, it achieves a 15-37x resource overhead reduction compared to prior on-chip-only decoding. △ Less

Submitted 25 October, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

Comments: To appear at the 28th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2023)

arXiv:2207.07771 [pdf, other]

Auggie: Encouraging Effortful Communication through Handcrafted Digital Experiences

Authors: Lei Zhang, Tianying Chen, Olivia Seow, Tim Chong, Sven Kratz, Yu Jiang Tham, Andrés Monroy-Hernández, Rajan Vaish, Fannie Liu

Abstract: Digital communication is often brisk and automated. From auto-completed messages to "likes," research has shown that such lightweight interactions can affect perceptions of authenticity and closeness. On the other hand, effort in relationships can forge emotional bonds by conveying a sense of caring and is essential in building and maintaining relationships. To explore effortful communication, we… ▽ More Digital communication is often brisk and automated. From auto-completed messages to "likes," research has shown that such lightweight interactions can affect perceptions of authenticity and closeness. On the other hand, effort in relationships can forge emotional bonds by conveying a sense of caring and is essential in building and maintaining relationships. To explore effortful communication, we designed and evaluated Auggie, an iOS app that encourages partners to create digitally handcrafted Augmented Reality (AR) experiences for each other. Auggie is centered around crafting a 3D character with photos, animated movements, drawings, and audio for someone else. We conducted a two-week-long field study with 30 participants (15 pairs), who used Auggie with their partners remotely. Our qualitative findings show that Auggie participants engaged in meaningful effort through the handcrafting process, and felt closer to their partners, although the tool may not be appropriate in all situations. We discuss design implications and future directions for systems that encourage effortful communication. △ Less

Submitted 15 July, 2022; originally announced July 2022.

Comments: To appear at the 25th ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW '22). 25 pages

arXiv:2205.00661 [pdf, other]

Giallar: Push-Button Verification for the Qiskit Quantum Compiler

Authors: Runzhou Tao, Yunong Shi, Jianan Yao, Xupeng Li, Ali Javadi-Abhari, Andrew W. Cross, Frederic T. Chong, Ronghui Gu

Abstract: This paper presents Giallar, a fully-automated verification toolkit for quantum compilers. Giallar requires no manual specifications, invariants, or proofs, and can automatically verify that a compiler pass preserves the semantics of quantum circuits. To deal with unbounded loops in quantum compilers, Giallar abstracts three loop templates, whose loop invariants can be automatically inferred. To e… ▽ More This paper presents Giallar, a fully-automated verification toolkit for quantum compilers. Giallar requires no manual specifications, invariants, or proofs, and can automatically verify that a compiler pass preserves the semantics of quantum circuits. To deal with unbounded loops in quantum compilers, Giallar abstracts three loop templates, whose loop invariants can be automatically inferred. To efficiently check the equivalence of arbitrary input and output circuits that have complicated matrix semantics representation, Giallar introduces a symbolic representation for quantum circuits and a set of rewrite rules for showing the equivalence of symbolic quantum circuits. With Giallar, we implemented and verified 44 (out of 56) compiler passes in 13 versions of the Qiskit compiler, the open-source quantum compiler standard, during which three bugs were detected in and confirmed by Qiskit. Our evaluation shows that most of Qiskit compiler passes can be automatically verified in seconds and verification imposes only a modest overhead to compilation performance. △ Less

Submitted 2 May, 2022; originally announced May 2022.

Comments: PLDI 2022; Improves arXiv:1908.08963

arXiv:2203.13260 [pdf, other]

Adaptive job and resource management for the growing quantum cloud

Authors: Gokul Subramanian Ravi, Kaitlin N. Smith, Prakash Murali, Frederic T. Chong

Abstract: As the popularity of quantum computing continues to grow, efficient quantum machine access over the cloud is critical to both academic and industry researchers across the globe. And as cloud quantum computing demands increase exponentially, the analysis of resource consumption and execution characteristics are key to efficient management of jobs and resources at both the vendor-end as well as the… ▽ More As the popularity of quantum computing continues to grow, efficient quantum machine access over the cloud is critical to both academic and industry researchers across the globe. And as cloud quantum computing demands increase exponentially, the analysis of resource consumption and execution characteristics are key to efficient management of jobs and resources at both the vendor-end as well as the client-end. While the analysis and optimization of job / resource consumption and management are popular in the classical HPC domain, it is severely lacking for more nascent technology like quantum computing. This paper proposes optimized adaptive job scheduling to the quantum cloud taking note of primary characteristics such as queuing times and fidelity trends across machines, as well as other characteristics such as quality of service guarantees and machine calibration constraints. Key components of the proposal include a) a prediction model which predicts fidelity trends across machine based on compiled circuit features such as circuit depth and different forms of errors, as well as b) queuing time prediction for each machine based on execution time estimations. Overall, this proposal is evaluated on simulated IBM machines across a diverse set of quantum applications and system loading scenarios, and is able to reduce wait times by over 3x and improve fidelity by over 40\% on specific usecases, when compared to traditional job schedulers. △ Less

Submitted 24 March, 2022; originally announced March 2022.

Comments: Appeared at the 2021 IEEE International Conference on Quantum Computing and Engineering. arXiv admin note: text overlap with arXiv:2203.13121. substantial text overlap with arXiv:2203.13121

arXiv:2203.13121 [pdf, other]

Quantum Computing in the Cloud: Analyzing job and machine characteristics

Authors: Gokul Subramanian Ravi, Kaitlin N. Smith, Pranav Gokhale, Frederic T. Chong

Abstract: As the popularity of quantum computing continues to grow, quantum machine access over the cloud is critical to both academic and industry researchers across the globe. And as cloud quantum computing demands increase exponentially, the analysis of resource consumption and execution characteristics are key to efficient management of jobs and resources at both the vendor-end as well as the client-end… ▽ More As the popularity of quantum computing continues to grow, quantum machine access over the cloud is critical to both academic and industry researchers across the globe. And as cloud quantum computing demands increase exponentially, the analysis of resource consumption and execution characteristics are key to efficient management of jobs and resources at both the vendor-end as well as the client-end. While the analysis of resource consumption and management are popular in the classical HPC domain, it is severely lacking for more nascent technology like quantum computing. This paper is a first-of-its-kind academic study, analyzing various trends in job execution and resources consumption / utilization on quantum cloud systems. We focus on IBM Quantum systems and analyze characteristics over a two year period, encompassing over 6000 jobs which contain over 600,000 quantum circuit executions and correspond to almost 10 billion "shots" or trials over 20+ quantum machines. Specifically, we analyze trends focused on, but not limited to, execution times on quantum machines, queuing/waiting times in the cloud, circuit compilation times, machine utilization, as well as the impact of job and machine characteristics on all of these trends. Our analysis identifies several similarities and differences with classical HPC cloud systems. Based on our insights, we make recommendations and contributions to improve the management of resources and jobs on future quantum cloud systems. △ Less

Submitted 24 March, 2022; originally announced March 2022.

Comments: Appeared at the 2021 IEEE International Symposium on Workload Characterization

arXiv:2203.12713 [pdf, other]

Optimized Quantum Program Execution Ordering to Mitigate Errors in Simulations of Quantum Systems

Authors: Teague Tomesh, Kaiwen Gui, Pranav Gokhale, Yunong Shi, Frederic T. Chong, Margaret Martonosi, Martin Suchara

Abstract: Simulating the time evolution of a physical system at quantum mechanical levels of detail -- known as Hamiltonian Simulation (HS) -- is an important and interesting problem across physics and chemistry. For this task, algorithms that run on quantum computers are known to be exponentially faster than classical algorithms; in fact, this application motivated Feynman to propose the construction of qu… ▽ More Simulating the time evolution of a physical system at quantum mechanical levels of detail -- known as Hamiltonian Simulation (HS) -- is an important and interesting problem across physics and chemistry. For this task, algorithms that run on quantum computers are known to be exponentially faster than classical algorithms; in fact, this application motivated Feynman to propose the construction of quantum computers. Nonetheless, there are challenges in reaching this performance potential. Prior work has focused on compiling circuits (quantum programs) for HS with the goal of maximizing either accuracy or gate cancellation. Our work proposes a compilation strategy that simultaneously advances both goals. At a high level, we use classical optimizations such as graph coloring and travelling salesperson to order the execution of quantum programs. Specifically, we group together mutually commuting terms in the Hamiltonian (a matrix characterizing the quantum mechanical system) to improve the accuracy of the simulation. We then rearrange the terms within each group to maximize gate cancellation in the final quantum circuit. These optimizations work together to improve HS performance and result in an average 40% reduction in circuit depth. This work advances the frontier of HS which in turn can advance physical and chemical modeling in both basic and applied sciences. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: 13 pages, 7 figures, Awarded Best Paper during the IEEE International Conference on Rebooting Computing (ICRC) 2021

arXiv:2202.12924 [pdf, other]

CAFQA: A classical simulation bootstrap for variational quantum algorithms

Authors: Gokul Subramanian Ravi, Pranav Gokhale, Yi Ding, William M. Kirby, Kaitlin N. Smith, Jonathan M. Baker, Peter J. Love, Henry Hoffmann, Kenneth R. Brown, Frederic T. Chong

Abstract: This work tackles the problem of finding a good ansatz initialization for Variational Quantum Algorithms (VQAs), by proposing CAFQA, a Clifford Ansatz For Quantum Accuracy. The CAFQA ansatz is a hardware-efficient circuit built with only Clifford gates. In this ansatz, the parameters for the tunable gates are chosen by searching efficiently through the Clifford parameter space via classical simula… ▽ More This work tackles the problem of finding a good ansatz initialization for Variational Quantum Algorithms (VQAs), by proposing CAFQA, a Clifford Ansatz For Quantum Accuracy. The CAFQA ansatz is a hardware-efficient circuit built with only Clifford gates. In this ansatz, the parameters for the tunable gates are chosen by searching efficiently through the Clifford parameter space via classical simulation. The resulting initial states always equal or outperform traditional classical initialization (e.g., Hartree-Fock), and enable high-accuracy VQA estimations. CAFQA is well-suited to classical computation because: a) Clifford-only quantum circuits can be exactly simulated classically in polynomial time, and b) the discrete Clifford space is searched efficiently via Bayesian Optimization. For the Variational Quantum Eigensolver (VQE) task of molecular ground state energy estimation (up to 18 qubits), CAFQA's Clifford Ansatz achieves a mean accuracy of nearly 99% and recovers as much as 99.99% of the molecular correlation energy that is lost in Hartree-Fock initialization. CAFQA achieves mean accuracy improvements of 6.4x and 56.8x, over the state-of-the-art, on different metrics. The scalability of the approach allows for preliminary ground state energy estimation of the challenging chromium dimer (Cr$_2$) molecule. With CAFQA's high-accuracy initialization, the convergence of VQAs is shown to accelerate by 2.5x, even for small molecules. Furthermore, preliminary exploration of allowing a limited number of non-Clifford (T) gates in the CAFQA framework, shows that as much as 99.9% of the correlation energy can be recovered at bond lengths for which Clifford-only CAFQA accuracy is relatively limited, while remaining classically simulable. △ Less

Submitted 29 September, 2023; v1 submitted 25 February, 2022; originally announced February 2022.

Comments: Appears at the 28th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2023). Previous title - CAFQA: Clifford Ansatz For Quantum Accuracy. Paper revised to ASPLOS requirements, added additional improvements to the CAFQA framework / evaluation. Added preliminary exploration on CAFQA with T gates

arXiv:2202.11045 [pdf, other]

SupermarQ: A Scalable Quantum Benchmark Suite

Authors: Teague Tomesh, Pranav Gokhale, Victory Omole, Gokul Subramanian Ravi, Kaitlin N. Smith, Joshua Viszlai, Xin-Chuan Wu, Nikos Hardavellas, Margaret R. Martonosi, Frederic T. Chong

Abstract: The emergence of quantum computers as a new computational paradigm has been accompanied by speculation concerning the scope and timeline of their anticipated revolutionary changes. While quantum computing is still in its infancy, the variety of different architectures used to implement quantum computations make it difficult to reliably measure and compare performance. This problem motivates our in… ▽ More The emergence of quantum computers as a new computational paradigm has been accompanied by speculation concerning the scope and timeline of their anticipated revolutionary changes. While quantum computing is still in its infancy, the variety of different architectures used to implement quantum computations make it difficult to reliably measure and compare performance. This problem motivates our introduction of SupermarQ, a scalable, hardware-agnostic quantum benchmark suite which uses application-level metrics to measure performance. SupermarQ is the first attempt to systematically apply techniques from classical benchmarking methodology to the quantum domain. We define a set of feature vectors to quantify coverage, select applications from a variety of domains to ensure the suite is representative of real workloads, and collect benchmark results from the IBM, IonQ, and AQT@LBNL platforms. Looking forward, we envision that quantum benchmarking will encompass a large cross-community effort built on open source, constantly evolving benchmark suites. We introduce SupermarQ as an important step in this direction. △ Less

Submitted 27 April, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

Comments: 17 pages, 4 figures, Awarded Best Paper during the 28th IEEE International Symposium on High-Performance Computer Architecture (HPCA-28), Seoul, South Korea

arXiv:2201.08825 [pdf, other]

Modeling Short-Range Microwave Networks to Scale Superconducting Quantum Computation

Authors: Nicholas LaRacuente, Kaitlin N. Smith, Poolad Imany, Kevin L. Silverman, Frederic T. Chong

Abstract: A core challenge for superconducting quantum computers is to scale up the number of qubits in each processor without increasing noise or cross-talk. Distributed quantum computing across small qubit arrays, known as chiplets, can address these challenges in a scalable manner. We propose a chiplet architecture over microwave links with potential to exceed monolithic performance on near-term hardware… ▽ More A core challenge for superconducting quantum computers is to scale up the number of qubits in each processor without increasing noise or cross-talk. Distributed quantum computing across small qubit arrays, known as chiplets, can address these challenges in a scalable manner. We propose a chiplet architecture over microwave links with potential to exceed monolithic performance on near-term hardware. Our methods of modeling and evaluating the chiplet architecture bridges the physical and network layers in these processors. We find evidence that distributing computation across chiplets may reduce the overall error rates associated with moving data across the device, despite higher error figures for transfers across links. Preliminary analyses suggest that latency is not substantially impacted, and that at least some applications and architectures may avoid bottlenecks around chiplet boundaries. In the long-term, short-range networks may underlie quantum computers just as local area networks underlie classical datacenters and supercomputers today. △ Less

Submitted 5 January, 2023; v1 submitted 21 January, 2022; originally announced January 2022.

Comments: 23 pages, 11 figures

arXiv:2111.06469 [pdf, other]

Exploiting Long-Distance Interactions and Tolerating Atom Loss in Neutral Atom Quantum Architectures

Authors: Jonathan M. Baker, Andrew Litteken, Casey Duckering, Henry Hoffman, Hannes Bernien, Frederic T. Chong

Abstract: Quantum technologies currently struggle to scale beyond moderate scale prototypes and are unable to execute even reasonably sized programs due to prohibitive gate error rates or coherence times. Many software approaches rely on heavy compiler optimization to squeeze extra value from noisy machines but are fundamentally limited by hardware. Alone, these software approaches help to maximize the use… ▽ More Quantum technologies currently struggle to scale beyond moderate scale prototypes and are unable to execute even reasonably sized programs due to prohibitive gate error rates or coherence times. Many software approaches rely on heavy compiler optimization to squeeze extra value from noisy machines but are fundamentally limited by hardware. Alone, these software approaches help to maximize the use of available hardware but cannot overcome the inherent limitations posed by the underlying technology. An alternative approach is to explore the use of new, though potentially less developed, technology as a path towards scalability. In this work we evaluate the advantages and disadvantages of a Neutral Atom (NA) architecture. NA systems offer several promising advantages such as long range interactions and native multiqubit gates which reduce communication overhead, overall gate count, and depth for compiled programs. Long range interactions, however, impede parallelism with restriction zones surrounding interacting qubit pairs. We extend current compiler methods to maximize the benefit of these advantages and minimize the cost. Furthermore, atoms in an NA device have the possibility to randomly be lost over the course of program execution which is extremely detrimental to total program execution time as atom arrays are slow to load. When the compiled program is no longer compatible with the underlying topology, we need a fast and efficient co** mechanism. We propose hardware and compiler methods to increase system resilience to atom loss dramatically reducing total computation time by circumventing complete reloads or full recompilation every cycle. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Comments: 14 pages, 14 figures, In ISCA '21: The 48th International Symposium on Computer Architecture

arXiv:2110.11331 [pdf, other]

doi 10.1145/3489517.3530400

QuantumNAT: Quantum Noise-Aware Training with Noise Injection, Quantization and Normalization

Authors: Hanrui Wang, Jiaqi Gu, Yongshan Ding, Zirui Li, Frederic T. Chong, David Z. Pan, Song Han

Abstract: Parameterized Quantum Circuits (PQC) are promising towards quantum advantage on near-term quantum hardware. However, due to the large quantum noises (errors), the performance of PQC models has a severe degradation on real quantum devices. Take Quantum Neural Network (QNN) as an example, the accuracy gap between noise-free simulation and noisy results on IBMQ-Yorktown for MNIST-4 classification is… ▽ More Parameterized Quantum Circuits (PQC) are promising towards quantum advantage on near-term quantum hardware. However, due to the large quantum noises (errors), the performance of PQC models has a severe degradation on real quantum devices. Take Quantum Neural Network (QNN) as an example, the accuracy gap between noise-free simulation and noisy results on IBMQ-Yorktown for MNIST-4 classification is over 60%. Existing noise mitigation methods are general ones without leveraging unique characteristics of PQC; on the other hand, existing PQC work does not consider noise effect. To this end, we present QuantumNAT, a PQC-specific framework to perform noise-aware optimizations in both training and inference stages to improve robustness. We experimentally observe that the effect of quantum noise to PQC measurement outcome is a linear map from noise-free outcome with a scaling and a shift factor. Motivated by that, we propose post-measurement normalization to mitigate the feature distribution differences between noise-free and noisy scenarios. Furthermore, to improve the robustness against noise, we propose noise injection to the training process by inserting quantum error gates to PQC according to realistic noise models of quantum hardware. Finally, post-measurement quantization is introduced to quantize the measurement outcomes to discrete values, achieving the denoising effect. Extensive experiments on 8 classification tasks using 6 quantum devices demonstrate that QuantumNAT improves accuracy by up to 43%, and achieves over 94% 2-class, 80% 4-class, and 34% 10-class classification accuracy measured on real quantum computers. The code for construction and noise-aware training of PQC is available in the TorchQuantum library. △ Less

Submitted 13 June, 2023; v1 submitted 21 October, 2021; originally announced October 2021.

Comments: Published as a conference paper at DAC 2022; 10 pages, 9 figures; TorchQuantum open-source at https://github.com/mit-han-lab/torchquantum

arXiv:2109.04654 [pdf, other]

Per Garment Capture and Synthesis for Real-time Virtual Try-on

Authors: Toby Chong, I-Chao Shen, Nobuyuki Umetani, Takeo Igarashi

Abstract: Virtual try-on is a promising application of computer graphics and human computer interaction that can have a profound real-world impact especially during this pandemic. Existing image-based works try to synthesize a try-on image from a single image of a target garment, but it inherently limits the ability to react to possible interactions. It is difficult to reproduce the change of wrinkles cause… ▽ More Virtual try-on is a promising application of computer graphics and human computer interaction that can have a profound real-world impact especially during this pandemic. Existing image-based works try to synthesize a try-on image from a single image of a target garment, but it inherently limits the ability to react to possible interactions. It is difficult to reproduce the change of wrinkles caused by pose and body size change, as well as pulling and stretching of the garment by hand. In this paper, we propose an alternative per garment capture and synthesis workflow to handle such rich interactions by training the model with many systematically captured images. Our workflow is composed of two parts: garment capturing and clothed person image synthesis. We designed an actuated mannequin and an efficient capturing process that collects the detailed deformations of the target garments under diverse body sizes and poses. Furthermore, we proposed to use a custom-designed measurement garment, and we captured paired images of the measurement garment and the target garments. We then learn a map** between the measurement garment and the target garments using deep image-to-image translation. The customer can then try on the target garments interactively during online shop**. △ Less

Submitted 9 September, 2021; originally announced September 2021.

Comments: Accepted to UIST2021. Project page: https://sites.google.com/view/deepmannequin/home

arXiv:2109.00133 [pdf, other]

AugLimb: Compact Robotic Limb for Human Augmentation

Authors: Zeyu Ding, Shogo Yoshida, Toby Chong, Tsukasa Fukusato, Takuma Torii, Haoran Xie

Abstract: This work proposes a compact robotic limb, AugLimb, that can augment our body functions and support the daily activities. AugLimb adopts the double-layer scissor unit for the extendable mechanism which can achieve 2.5 times longer than the forearm length. The proposed device can be mounted on the user's upper arm, and transform into compact state without obstruction to wearers. The proposed device… ▽ More This work proposes a compact robotic limb, AugLimb, that can augment our body functions and support the daily activities. AugLimb adopts the double-layer scissor unit for the extendable mechanism which can achieve 2.5 times longer than the forearm length. The proposed device can be mounted on the user's upper arm, and transform into compact state without obstruction to wearers. The proposed device is lightweight with low burden exerted on the wearer. We developed the prototype of AugLimb to demonstrate the proposed mechanisms. We believe that the design methodology of AugLimb can facilitate human augmentation research for practical use. see http://www.jaist.ac.jp/~xie/auglimb.html △ Less

Submitted 31 August, 2021; originally announced September 2021.

Comments: 2 pages, 3 figures

arXiv:2107.10845 [pdf, other]

doi 10.1109/HPCA53966.2022.00057

QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits

Authors: Hanrui Wang, Yongshan Ding, Jiaqi Gu, Zirui Li, Yujun Lin, David Z. Pan, Frederic T. Chong, Song Han

Abstract: Quantum noise is the key challenge in Noisy Intermediate-Scale Quantum (NISQ) computers. Previous work for mitigating noise has primarily focused on gate-level or pulse-level noise-adaptive compilation. However, limited research efforts have explored a higher level of optimization by making the quantum circuits themselves resilient to noise. We propose QuantumNAS, a comprehensive framework for n… ▽ More Quantum noise is the key challenge in Noisy Intermediate-Scale Quantum (NISQ) computers. Previous work for mitigating noise has primarily focused on gate-level or pulse-level noise-adaptive compilation. However, limited research efforts have explored a higher level of optimization by making the quantum circuits themselves resilient to noise. We propose QuantumNAS, a comprehensive framework for noise-adaptive co-search of the variational circuit and qubit map**. Variational quantum circuits are a promising approach for constructing QML and quantum simulation. However, finding the best variational circuit and its optimal parameters is challenging due to the large design space and parameter training cost. We propose to decouple the circuit search and parameter training by introducing a novel SuperCircuit. The SuperCircuit is constructed with multiple layers of pre-defined parameterized gates and trained by iteratively sampling and updating the parameter subsets (SubCircuits) of it. It provides an accurate estimation of SubCircuits performance trained from scratch. Then we perform an evolutionary co-search of SubCircuit and its qubit map**. The SubCircuit performance is estimated with parameters inherited from SuperCircuit and simulated with real device noise models. Finally, we perform iterative gate pruning and finetuning to remove redundant gates. Extensively evaluated with 12 QML and VQE benchmarks on 14 quantum computers, QuantumNAS significantly outperforms baselines. For QML, QuantumNAS is the first to demonstrate over 95% 2-class, 85% 4-class, and 32% 10-class classification accuracy on real QC. It also achieves the lowest eigenvalue for VQE tasks on H2, H2O, LiH, CH4, BeH2 compared with UCCSD. We also open-source TorchQuantum (https://github.com/mit-han-lab/torchquantum) for fast training of parameterized quantum circuits to facilitate future research. △ Less

Submitted 6 January, 2022; v1 submitted 22 July, 2021; originally announced July 2021.

Comments: Published as a conference paper in HPCA 2022. 19 pages, 22 figures. TorchQuantum Code available at https://github.com/mit-han-lab/torchquantum

arXiv:2104.06349 [pdf, other]

Gleipnir: Toward Practical Error Analysis for Quantum Programs (Extended Version)

Authors: Runzhou Tao, Yunong Shi, Jianan Yao, John Hui, Frederic T. Chong, Ronghui Gu

Abstract: Practical error analysis is essential for the design, optimization, and evaluation of Noisy Intermediate-Scale Quantum(NISQ) computing. However, bounding errors in quantum programs is a grand challenge, because the effects of quantum errors depend on exponentially large quantum states. In this work, we present Gleipnir, a novel methodology toward practically computing verified error bounds in quan… ▽ More Practical error analysis is essential for the design, optimization, and evaluation of Noisy Intermediate-Scale Quantum(NISQ) computing. However, bounding errors in quantum programs is a grand challenge, because the effects of quantum errors depend on exponentially large quantum states. In this work, we present Gleipnir, a novel methodology toward practically computing verified error bounds in quantum programs. Gleipnir introduces the $(\hatρ,δ)$-diamond norm, an error metric constrained by a quantum predicate consisting of the approximate state $\hatρ$ and its distance $δ$ to the ideal state $ρ$. This predicate $(\hatρ,δ)$ can be computed adaptively using tensor networks based on the Matrix Product States. Gleipnir features a lightweight logic for reasoning about error bounds in noisy quantum programs, based on the $(\hatρ,δ)$-diamond norm metric. Our experimental results show that Gleipnir is able to efficiently generate tight error bounds for real-world quantum programs with 10 to 100 qubits, and can be used to evaluate the error mitigation performance of quantum compiler transformations. △ Less

Submitted 19 April, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

Comments: typos corrected

arXiv:2104.01572 [pdf, other]

TransfoRNN: Capturing the Sequential Information in Self-Attention Representations for Language Modeling

Authors: Tze Yuang Chong, Xuyang Wang, Lin Yang, Junjie Wang

Abstract: In this paper, we describe the use of recurrent neural networks to capture sequential information from the self-attention representations to improve the Transformers. Although self-attention mechanism provides a means to exploit long context, the sequential information, i.e. the arrangement of tokens, is not explicitly captured. We propose to cascade the recurrent neural networks to the Transforme… ▽ More In this paper, we describe the use of recurrent neural networks to capture sequential information from the self-attention representations to improve the Transformers. Although self-attention mechanism provides a means to exploit long context, the sequential information, i.e. the arrangement of tokens, is not explicitly captured. We propose to cascade the recurrent neural networks to the Transformers, which referred to as the TransfoRNN model, to capture the sequential information. We found that the TransfoRNN models which consists of only shallow Transformers stack is suffice to give comparable, if not better, performance than a deeper Transformer model. Evaluated on the Penn Treebank and WikiText-2 corpora, the proposed TransfoRNN model has shown lower model perplexities with fewer number of model parameters. On the Penn Treebank corpus, the model perplexities were reduced up to 5.5% with the model size reduced up to 10.5%. On the WikiText-2 corpus, the model perplexity was reduced up to 2.2% with a 27.7% smaller model. Also, the TransfoRNN model was applied on the LibriSpeech speech recognition task and has shown comparable results with the Transformer models. △ Less

Submitted 4 April, 2021; originally announced April 2021.

Comments: INTERSPEECH 2021 (under reviewed)

arXiv:2103.04544 [pdf, other]

Exploring a Makeup Support System for Transgender Passing based on Automatic Gender Recognition

Authors: Toby Chong, Nolwenn Maudet, Katsuki Harima, Takeo Igarashi

Abstract: How to handle gender with machine learning is a controversial topic. A growing critical body of research brought attention to the numerous issues transgender communities face with the adoption of current automatic gender recognition (AGR) systems. In contrast, we explore how such technologies could potentially be appropriated to support transgender practices and needs, especially in non-Western co… ▽ More How to handle gender with machine learning is a controversial topic. A growing critical body of research brought attention to the numerous issues transgender communities face with the adoption of current automatic gender recognition (AGR) systems. In contrast, we explore how such technologies could potentially be appropriated to support transgender practices and needs, especially in non-Western contexts like Japan. We designed a virtual makeup probe to assist transgender individuals with passing, that is to be perceived as the gender they identify as. To understand how such an application might support expressing transgender individuals gender identity or not, we interviewed 15 individuals in Tokyo and found that in the right context and under strict conditions, AGR based systems could assist transgender passing. △ Less

Submitted 7 March, 2021; originally announced March 2021.

Comments: Accepted to CHI2021. Project Page: https://sites.google.com/view/flyingcolor

arXiv:2102.08451 [pdf, other]

doi 10.1145/3445814.3446718

Orchestrated Trios: Compiling for Efficient Communication in Quantum Programs with 3-Qubit Gates

Authors: Casey Duckering, Jonathan M. Baker, Andrew Litteken, Frederic T. Chong

Abstract: Current quantum computers are especially error prone and require high levels of optimization to reduce operation counts and maximize the probability the compiled program will succeed. These computers only support operations decomposed into one- and two-qubit gates and only two-qubit gates between physically connected pairs of qubits. Typical compilers first decompose operations, then route data to… ▽ More Current quantum computers are especially error prone and require high levels of optimization to reduce operation counts and maximize the probability the compiled program will succeed. These computers only support operations decomposed into one- and two-qubit gates and only two-qubit gates between physically connected pairs of qubits. Typical compilers first decompose operations, then route data to connected qubits. We propose a new compiler structure, Orchestrated Trios, that first decomposes to the three-qubit Toffoli, routes the inputs of the higher-level Toffoli operations to groups of nearby qubits, then finishes decomposition to hardware-supported gates. This significantly reduces communication overhead by giving the routing pass access to the higher-level structure of the circuit instead of discarding it. A second benefit is the ability to now select an architecture-tuned Toffoli decomposition such as the 8-CNOT Toffoli for the specific hardware qubits now known after the routing pass. We perform real experiments on IBM Johannesburg showing an average 35% decrease in two-qubit gate count and 23% increase in success rate of a single Toffoli over Qiskit. We additionally compile many near-term benchmark algorithms showing an average 344% increase in (or 4.44x) simulated success rate on the Johannesburg architecture and compare with other architecture types. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Comments: In ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 12 pages, 12 figures

arXiv:2009.01982 [pdf, other]

Virtualized Logical Qubits: A 2.5D Architecture for Error-Corrected Quantum Computing

Authors: Casey Duckering, Jonathan M. Baker, David I. Schuster, Frederic T. Chong

Abstract: Current, near-term quantum devices have shown great progress in recent years culminating with a demonstration of quantum supremacy. In the medium-term, however, quantum machines will need to transition to greater reliability through error correction, likely through promising techniques such as surface codes which are well suited for near-term devices with limited qubit connectivity. We discover qu… ▽ More Current, near-term quantum devices have shown great progress in recent years culminating with a demonstration of quantum supremacy. In the medium-term, however, quantum machines will need to transition to greater reliability through error correction, likely through promising techniques such as surface codes which are well suited for near-term devices with limited qubit connectivity. We discover quantum memory, particularly resonant cavities with transmon qubits arranged in a 2.5D architecture, can efficiently implement surface codes with substantial hardware savings and performance/fidelity gains. Specifically, we *virtualize logical qubits* by storing them in layers distributed across qubit memories connected to each transmon. Surprisingly, distributing each logical qubit across many memories has a minimal impact on fault tolerance and results in substantially more efficient operations. Our design permits fast transversal CNOT operations between logical qubits sharing the same physical address which are 6x faster than lattice surgery CNOTs. We develop a novel embedding which saves ~10x in transmons with another 2x from an additional optimization for compactness. Although Virtualized Logical Qubits (VLQ) pays a 10x penalty in serialization, advantages in the transversal CNOT and area efficiency result in performance comparable to 2D transmon-only architectures. Our simulations show fault tolerance comparable to 2D architectures while saving substantial hardware. Furthermore, VLQ can produce magic states 1.22x faster for a fixed number of transmon qubits. This is a critical benchmark for future fault-tolerant quantum computers. VLQ substantially reduces the hardware requirements for fault tolerance and puts within reach a proof-of-concept experimental demonstration of around 10 logical qubits, requiring only 11 transmons and 9 attached cavities in total. △ Less

Submitted 3 September, 2020; originally announced September 2020.

Comments: 12 pages, 13 figures, In MICRO '20: 53rd IEEE/ACM International Symposium on Microarchitecture

arXiv:2005.12259 [pdf, other]

doi 10.1145/3387902.3392617

Time-Sliced Quantum Circuit Partitioning for Modular Architectures

Authors: Jonathan M. Baker, Casey Duckering, Alexander Hoover, Frederic T. Chong

Abstract: Current quantum computer designs will not scale. To scale beyond small prototypes, quantum architectures will likely adopt a modular approach with clusters of tightly connected quantum bits and sparser connections between clusters. We exploit this clustering and the statically-known control flow of quantum programs to create tractable partitioning heuristics which map quantum circuits to modular p… ▽ More Current quantum computer designs will not scale. To scale beyond small prototypes, quantum architectures will likely adopt a modular approach with clusters of tightly connected quantum bits and sparser connections between clusters. We exploit this clustering and the statically-known control flow of quantum programs to create tractable partitioning heuristics which map quantum circuits to modular physical machines one time slice at a time. Specifically, we create optimized map**s for each time slice, accounting for the cost to move data from the previous time slice and using a tunable lookahead scheme to reduce the cost to move to future time slices. We compare our approach to a traditional statically-mapped, owner-computes model. Our results show strict improvement over the static map** baseline. We reduce the non-local communication overhead by 89.8\% in the best case and by 60.9\% on average. Our techniques, unlike many exact solver methods, are computationally tractable. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: Appears in CF'20: ACM International Conference on Computing Frontiers

Journal ref: 17th ACM International Conference on Computing Frontiers (2020)

arXiv:2004.14970 [pdf, other]

Coreset Clustering on Small Quantum Computers

Authors: Teague Tomesh, Pranav Gokhale, Eric R. Anschuetz, Frederic T. Chong

Abstract: Many quantum algorithms for machine learning require access to classical data in superposition. However, for many natural data sets and algorithms, the overhead required to load the data set in superposition can erase any potential quantum speedup over classical algorithms. Recent work by Harrow introduces a new paradigm in hybrid quantum-classical computing to address this issue, relying on cores… ▽ More Many quantum algorithms for machine learning require access to classical data in superposition. However, for many natural data sets and algorithms, the overhead required to load the data set in superposition can erase any potential quantum speedup over classical algorithms. Recent work by Harrow introduces a new paradigm in hybrid quantum-classical computing to address this issue, relying on coresets to minimize the data loading overhead of quantum algorithms. We investigate using this paradigm to perform $k$-means clustering on near-term quantum computers, by casting it as a QAOA optimization instance over a small coreset. We compare the performance of this approach to classical $k$-means clustering both numerically and experimentally on IBM Q hardware. We are able to find data sets where coresets work well relative to random sampling and where QAOA could potentially outperform standard $k$-means on a coreset. However, finding data sets where both coresets and QAOA work well--which is necessary for a quantum advantage over $k$-means on the entire data set--appears to be challenging. △ Less

Submitted 30 April, 2020; originally announced April 2020.

arXiv:2002.10592 [pdf, other]

Efficient Quantum Circuit Decompositions via Intermediate Qudits

Authors: Jonathan M. Baker, Casey Duckering, Frederic T. Chong

Abstract: Many quantum algorithms make use of ancilla, additional qubits used to store temporary information during computation, to reduce the total execution time. Quantum computers will be resource-constrained for years to come so reducing ancilla requirements is crucial. In this work, we give a method to generate ancilla out of idle qubits by placing some in higher-value states, called qudits. We show ho… ▽ More Many quantum algorithms make use of ancilla, additional qubits used to store temporary information during computation, to reduce the total execution time. Quantum computers will be resource-constrained for years to come so reducing ancilla requirements is crucial. In this work, we give a method to generate ancilla out of idle qubits by placing some in higher-value states, called qudits. We show how to take a circuit with many $O(n)$ ancilla and design an ancilla-free circuit with the same asymptotic depth. Using this, we give a circuit construction for an in-place adder and a constant adder both with $O(\log n)$ depth using temporary qudits and no ancilla. △ Less

Submitted 24 February, 2020; originally announced February 2020.

Comments: 6 pages, 4 figures, In ISMVL 2020: IEEE International Symposium on Multiple-Valued Logic

arXiv:1908.08963 [pdf, other]

CertiQ: A Mostly-automated Verification of a Realistic Quantum Compiler

Authors: Yunong Shi, Runzhou Tao, Xupeng Li, Ali Javadi-Abhari, Andrew W. Cross, Frederic T. Chong, Ronghui Gu

Abstract: We present CertiQ, a verification framework for writing and verifying compiler passes of Qiskit, the most widely-used quantum compiler. To our knowledge, CertiQ is the first effort enabling the verification of real-world quantum compiler passes in a mostly-automated manner. Compiler passes written in the CertiQ interface with annotations can be used to generate verification conditions, as well as… ▽ More We present CertiQ, a verification framework for writing and verifying compiler passes of Qiskit, the most widely-used quantum compiler. To our knowledge, CertiQ is the first effort enabling the verification of real-world quantum compiler passes in a mostly-automated manner. Compiler passes written in the CertiQ interface with annotations can be used to generate verification conditions, as well as the executable code that can be integrated into Qiskit. CertiQ introduces the quantum circuit calculus to enable the efficient checking of equivalence of quantum circuits by encoding such a checking procedure into an SMT problem. CertiQ also provides a verified library of widely-used data structures, transformation functions for circuits, and conversion functions for different quantum data representations. This verified library not only enables modular verification but also sheds light on future quantum compiler design. We have re-implemented and verified 26 (out of 30) Qiskit compiler passes in CertiQ, during which three bugs are detected in the Qiskit implementation. Our verified compiler pass implementations passed all of Qiskit's regression tests without showing noticeable performance loss. △ Less

Submitted 26 November, 2020; v1 submitted 23 August, 2019; originally announced August 2019.

arXiv:1905.10481 [pdf, other]

doi 10.1145/3307650.3322253

Asymptotic Improvements to Quantum Circuits via Qutrits

Authors: Pranav Gokhale, Jonathan M. Baker, Casey Duckering, Natalie C. Brown, Kenneth R. Brown, Frederic T. Chong

Abstract: Quantum computation is traditionally expressed in terms of quantum bits, or qubits. In this work, we instead consider three-level qu$trits$. Past work with qutrits has demonstrated only constant factor improvements, owing to the $\log_2(3)$ binary-to-ternary compression factor. We present a novel technique using qutrits to achieve a logarithmic depth (runtime) decomposition of the Generalized Toff… ▽ More Quantum computation is traditionally expressed in terms of quantum bits, or qubits. In this work, we instead consider three-level qu$trits$. Past work with qutrits has demonstrated only constant factor improvements, owing to the $\log_2(3)$ binary-to-ternary compression factor. We present a novel technique using qutrits to achieve a logarithmic depth (runtime) decomposition of the Generalized Toffoli gate using no ancilla--a significant improvement over linear depth for the best qubit-only equivalent. Our circuit construction also features a 70x improvement in two-qudit gate count over the qubit-only equivalent decomposition. This results in circuit cost reductions for important algorithms like quantum neurons and Grover search. We develop an open-source circuit simulator for qutrits, along with realistic near-term noise models which account for the cost of operating qutrits. Simulation results for these noise models indicate over 90% mean reliability (fidelity) for our circuit construction, versus under 30% for the qubit-only baseline. These results suggest that qutrits offer a promising path towards scaling quantum computation. △ Less

Submitted 24 May, 2019; originally announced May 2019.

Comments: In ISCA '19: 46th International Symposium on Computer Architecture, 13 pages, 11 figures

arXiv:1904.01671 [pdf, ps, other]

Decomposing Quantum Generalized Toffoli with an Arbitrary Number of Ancilla

Authors: Jonathan M. Baker, Casey Duckering, Alexander Hoover, Frederic T. Chong

Abstract: We present a general decomposition of the Generalized Toffoli, and for completeness, the multi-target gate using an arbitrary number of clean or dirty ancilla. While prior work has shown how to decompose the Generalized Toffoli using 0, 1, or $O(n)$ many clean ancilla and 0, 1, and $n-2$ dirty ancilla, we provide a generalized algorithm to bridge the gap, i.e. this work gives an algorithm to gener… ▽ More We present a general decomposition of the Generalized Toffoli, and for completeness, the multi-target gate using an arbitrary number of clean or dirty ancilla. While prior work has shown how to decompose the Generalized Toffoli using 0, 1, or $O(n)$ many clean ancilla and 0, 1, and $n-2$ dirty ancilla, we provide a generalized algorithm to bridge the gap, i.e. this work gives an algorithm to generate a decomposition for any number of clean or dirty ancilla. While it is hard to guarantee optimality, our decompositions guarantee a decrease in circuit depth as the number of ancilla increases. △ Less

Submitted 2 April, 2019; originally announced April 2019.

Comments: 10 pages, 5 figures

Showing 1–50 of 59 results for author: Chong, T