Search | arXiv e-print repository

Qwerty: A Basis-Oriented Quantum Programming Language

Authors: Austin J. Adams, Sharjeel Khan, Jeffrey S. Young, Thomas M. Conte

Abstract: Quantum computers have evolved from the theoretical realm into a race to large-scale implementations. This is due to the promise of revolutionary speedups, where achieving such speedup requires designing an algorithm that harnesses the structure of a problem using quantum mechanics. Yet many quantum programming languages today require programmers to reason at a low level of quantum gate circuitry.… ▽ More Quantum computers have evolved from the theoretical realm into a race to large-scale implementations. This is due to the promise of revolutionary speedups, where achieving such speedup requires designing an algorithm that harnesses the structure of a problem using quantum mechanics. Yet many quantum programming languages today require programmers to reason at a low level of quantum gate circuitry. This presents a significant barrier to entry for programmers who have not yet built up an intuition about quantum gate semantics, and it can prove to be tedious even for those who have. In this paper, we present Qwerty, a new quantum programming language that allows programmers to manipulate qubits more expressively than gates, relegating the tedious task of gate selection to the compiler. Due to its novel basis type and easy interoperability with Python, Qwerty is a powerful framework for high-level quantum-classical computation. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 30 pages, 27 figures

arXiv:2301.11559 [pdf, other]

doi 10.1109/IPDPSW59300.2023.00090

Enabling Multi-threading in Heterogeneous Quantum-Classical Programming Models

Authors: Akihiro Hayashi, Austin Adams, Jeffrey Young, Alexander McCaskey, Eugene Dumitrescu, Vivek Sarkar, Thomas M. Conte

Abstract: In this paper, we address some of the key limitations to realizing a generic heterogeneous parallel programming model for quantum-classical heterogeneous platforms. We discuss our experience in enabling user-level multi-threading in QCOR as well as challenges that need to be addressed for programming future quantum-classical systems. Specifically, we discuss our design and implementation of introd… ▽ More In this paper, we address some of the key limitations to realizing a generic heterogeneous parallel programming model for quantum-classical heterogeneous platforms. We discuss our experience in enabling user-level multi-threading in QCOR as well as challenges that need to be addressed for programming future quantum-classical systems. Specifically, we discuss our design and implementation of introducing C++-based parallel constructs to enable 1) parallel execution of a quantum kernel with std::thread and 2) asynchronous execution with std::async. To do so, we provide a detailed overview of the current implementation of the QCOR programming model and runtime, and discuss how we add 1) thread-safety to some of its user-facing API routines, and 2) increase parallelism in QCOR by removing data races that inhibit multi-threading so as to better utilize available computing resources. We also present preliminary performance results with the Quantum++ back end on a single-node Ryzen9 3900X machine that has 12 physical cores (24 hardware threads) with 128GB of RAM. The results show that running two Bell kernels with 12 threads per kernel in parallel outperforms running the kernels one after the other each with 24 threads (1.63x improvement). In addition, we observe the same trend when running two Shor's algorthm kernels in parallel (1.22x faster than executing the kernels one after the other). Furthermore, the parallel version is better in terms of strong scalability. We believe that our design, implementation, and results will open up an opportunity not only for 1) enabling quicker prototy** of parallel/asynchrony-aware quantum-classical algorithms on quantum circuit simulators in the short-term, but also for 2) realizing a generic heterogeneous parallel programming model for quantum-classical heterogeneous platforms in the long-term. △ Less

Submitted 15 March, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

arXiv:2204.05959 [pdf]

"Smarter" NICs for faster molecular dynamics: a case study

Authors: Sara Karamati, Clayton Hughes, K. Scott Hemmert, Ryan E. Grant, W. Whit Schonbein, Scott Levy, Thomas M. Conte, Jeffrey Young, Richard W. Vuduc

Abstract: This work evaluates the benefits of using a "smart" network interface card (SmartNIC) as a compute accelerator for the example of the MiniMD molecular dynamics proxy application. The accelerator is NVIDIA's BlueField-2 card, which includes an 8-core Arm processor along with a small amount of DRAM and storage. We test the networking and data movement performance of these cards compared to a standar… ▽ More This work evaluates the benefits of using a "smart" network interface card (SmartNIC) as a compute accelerator for the example of the MiniMD molecular dynamics proxy application. The accelerator is NVIDIA's BlueField-2 card, which includes an 8-core Arm processor along with a small amount of DRAM and storage. We test the networking and data movement performance of these cards compared to a standard Intel server host using microbenchmarks and MiniMD. In MiniMD, we identify two distinct classes of computation, namely core computation and maintenance computation, which are executed in sequence. We restructure the algorithm and code to weaken this dependence and increase task parallelism, thereby making it possible to increase utilization of the BlueField-2 concurrently with the host. We evaluate our implementation on a cluster consisting of 16 dual-socket Intel Broadwell host nodes with one BlueField-2 per host-node. Our results show that while the overall compute performance of BlueField-2 is limited, using them with a modified MiniMD algorithm allows for up to 20% speedup over the host CPU baseline with no loss in simulation accuracy. △ Less

Submitted 12 April, 2022; originally announced April 2022.

arXiv:2111.00146 [pdf, other]

Enabling a Programming Environment for an Experimental Ion Trap Quantum Testbed

Authors: Austin Adams, Elton Pinto, Jeffrey Young, Creston Herold, Alex McCaskey, Eugene Dumitrescu, Thomas M. Conte

Abstract: Ion trap quantum hardware promises to provide a computational advantage over classical computing for specific problem spaces while also providing an alternative hardware implementation path to cryogenic quantum systems as typified by IBM's quantum hardware. However, programming ion trap systems currently requires both strategies to mitigate high levels of noise and also tools to ease the challenge… ▽ More Ion trap quantum hardware promises to provide a computational advantage over classical computing for specific problem spaces while also providing an alternative hardware implementation path to cryogenic quantum systems as typified by IBM's quantum hardware. However, programming ion trap systems currently requires both strategies to mitigate high levels of noise and also tools to ease the challenge of programming these systems with pulse- or gate-level operations. This work focuses on improving the state-of-the-art for quantum programming of ion trap testbeds through the use of a quantum language specification, QCOR, and by demonstrating multi-level optimizations at the language, intermediate representation, and hardware backend levels. We implement a new QCOR/XACC backend to target a general ion trap testbed and then demonstrate the usage of multi-level optimizations to improve circuit fidelity and to reduce gate count. These techniques include the usage of a backend-specific numerical optimizer and physical gate optimizations to minimize the number of native instructions sent to the hardware. We evaluate our compiler backend using several QCOR benchmark programs, finding that on present testbed hardware, our compiler backend maintains the number of two-qubit native operations but decreases the number of single-qubit native operations by 1.54 times compared to the previous compiler regime. For projected testbed hardware upgrades, our compiler sees a reduction in two-qubit native operations by 2.40 times and one-qubit native operations by 6.13 times. △ Less

Submitted 4 November, 2021; v1 submitted 29 October, 2021; originally announced November 2021.

Comments: 10 pages, 6 figures; added missing author to metadata

arXiv:2101.01284 [pdf]

Advancing Computing's Foundation of US Industry & Society

Authors: Thomas M. Conte, Ian T. Foster, William Gropp, Mark D. Hill

Abstract: While past information technology (IT) advances have transformed society, future advances hold even greater promise. For example, we have only just begun to reap the changes from artificial intelligence (AI), especially machine learning (ML). Underlying IT's impact are the dramatic improvements in computer hardware, which deliver performance that unlock new capabilities. For example, recent succes… ▽ More While past information technology (IT) advances have transformed society, future advances hold even greater promise. For example, we have only just begun to reap the changes from artificial intelligence (AI), especially machine learning (ML). Underlying IT's impact are the dramatic improvements in computer hardware, which deliver performance that unlock new capabilities. For example, recent successes in AI/ML required the synergy of improved algorithms and hardware architectures (e.g., general-purpose graphics processing units). However, unlike in the 20th Century and early 2000s, tomorrow's performance aspirations must be achieved without continued semiconductor scaling formerly provided by Moore's Law and Dennard Scaling. How will one deliver the next 100x improvement in capability at similar or less cost to enable great value? Can we make the next AI leap without 100x better hardware? This whitepaper argues for a multipronged effort to develop new computing approaches beyond Moore's Law to advance the foundation that computing provides to US industry, education, medicine, science, and government. This impact extends far beyond the IT industry itself, as IT is now central for providing value across society, for example in semi-autonomous vehicles, tele-education, health wearables, viral analysis, and efficient administration. Herein we draw upon considerable visioning work by CRA's Computing Community Consortium (CCC) and the IEEE Rebooting Computing Initiative (IEEE RCI), enabled by thought leader input from industry, academia, and the US government. △ Less

Submitted 4 January, 2021; originally announced January 2021.

Comments: A Computing Community Consortium (CCC) white paper, 4 pages

Report number: ccc2020whitepaper_17

arXiv:1901.02775 [pdf, other]

Programming Strategies for Irregular Algorithms on the Emu Chick

Authors: Eric Hein, Srinivas Eswar, Abdurrahman Yaşar, Jiajia Li, Jeffrey S. Young, Thomas M. Conte, Ümit V. Çatalyürek, Rich Vuduc, Jason Riedy, Bora Uçar

Abstract: The Emu Chick prototype implements migratory memory-side processing in a novel hardware system. Rather than transferring large amounts of data across the system interconnect, the Emu Chick moves lightweight thread contexts to near-memory cores before the beginning of each remote memory read. Previous work has characterized the performance of the Chick prototype in terms of memory bandwidth and pro… ▽ More The Emu Chick prototype implements migratory memory-side processing in a novel hardware system. Rather than transferring large amounts of data across the system interconnect, the Emu Chick moves lightweight thread contexts to near-memory cores before the beginning of each remote memory read. Previous work has characterized the performance of the Chick prototype in terms of memory bandwidth and programming differences from more typical, non-migratory platforms, but there has not yet been an analysis of algorithms on this system. This work evaluates irregular algorithms that could benefit from the lightweight, memory-side processing of the Chick and demonstrates techniques and optimization strategies for achieving performance in sparse matrix-vector multiply operation (SpMV), breadth-first search (BFS), and graph alignment across up to eight distributed nodes encompassing 64 nodelets in the Chick system. We also define and justify relative metrics to compare prototype FPGA-based hardware with established ASIC architectures. The Chick currently supports up to 68x scaling for graph alignment, 80 MTEPS for BFS on balanced graphs, and 50\% of measured STREAM bandwidth for SpMV. △ Less

Submitted 3 December, 2018; originally announced January 2019.

arXiv:1809.07696 [pdf, other]

doi 10.1016/j.parco.2019.04.012

A Microbenchmark Characterization of the Emu Chick

Authors: Jeffrey S. Young, Eric Hein, Srinivas Eswar, Patrick Lavin, Jiajia Li, Jason Riedy, Richard Vuduc, Thomas M. Conte

Abstract: The Emu Chick is a prototype system designed around the concept of migratory memory-side processing. Rather than transferring large amounts of data across power-hungry, high-latency interconnects, the Emu Chick moves lightweight thread contexts to near-memory cores before the beginning of each memory read. The current prototype hardware uses FPGAs to implement cache-less "Gossamer cores for doing… ▽ More The Emu Chick is a prototype system designed around the concept of migratory memory-side processing. Rather than transferring large amounts of data across power-hungry, high-latency interconnects, the Emu Chick moves lightweight thread contexts to near-memory cores before the beginning of each memory read. The current prototype hardware uses FPGAs to implement cache-less "Gossamer cores for doing computational work and a stationary core to run basic operating system functions and migrate threads between nodes. In this multi-node characterization of the Emu Chick, we extend an earlier single-node investigation (Hein, et al. AsHES 2018) of the the memory bandwidth characteristics of the system through benchmarks like STREAM, pointer chasing, and sparse matrix-vector multiplication. We compare the Emu Chick hardware to architectural simulation and an Intel Xeon-based platform. Our results demonstrate that for many basic operations the Emu Chick can use available memory bandwidth more efficiently than a more traditional, cache-based architecture although bandwidth usage suffers for computationally intensive workloads like SpMV. Moreover, the Emu Chick provides stable, predictable performance with up to 65% of the peak bandwidth utilization on a random-access pointer chasing benchmark with weak locality. △ Less

Submitted 31 May, 2019; v1 submitted 7 September, 2018; originally announced September 2018.

Journal ref: Parallel Computing, 2019, ISSN 0167-8191

arXiv:1808.06334 [pdf, other]

Wrangling Rogues: Managing Experimental Post-Moore Architectures

Authors: Will Powell, Jason Riedy, Jeffrey S. Young, Thomas M. Conte

Abstract: The Rogues Gallery is a new experimental testbed that is focused on tackling "rogue" architectures for the Post-Moore era of computing. While some of these devices have roots in the embedded and high-performance computing spaces, managing current and emerging technologies provides a challenge for system administration that are not always foreseen in traditional data center environments. We prese… ▽ More The Rogues Gallery is a new experimental testbed that is focused on tackling "rogue" architectures for the Post-Moore era of computing. While some of these devices have roots in the embedded and high-performance computing spaces, managing current and emerging technologies provides a challenge for system administration that are not always foreseen in traditional data center environments. We present an overview of the motivations and design of the initial Rogues Gallery testbed and cover some of the unique challenges that we have seen and foresee with upcoming hardware prototypes for future post-Moore research. Specifically, we cover the networking, identity management, scheduling of resources, and tools and sensor access aspects of the Rogues Gallery and techniques we have developed to manage these new platforms. △ Less

Submitted 1 August, 2019; v1 submitted 20 August, 2018; originally announced August 2018.

arXiv:1706.10267 [pdf]

Challenges to Kee** the Computer Industry Centered in the US

Authors: Thomas M. Conte, Erik P. Debenedictis, R. Stanley Williams, Mark D. Hill

Abstract: It is undeniable that the worldwide computer industry's center is the US, specifically in Silicon Valley. Much of the reason for the success of Silicon Valley had to do with Moore's Law: the observation by Intel co-founder Gordon Moore that the number of transistors on a microchip doubled at a rate of approximately every two years. According to the International Technology Roadmap for Semiconducto… ▽ More It is undeniable that the worldwide computer industry's center is the US, specifically in Silicon Valley. Much of the reason for the success of Silicon Valley had to do with Moore's Law: the observation by Intel co-founder Gordon Moore that the number of transistors on a microchip doubled at a rate of approximately every two years. According to the International Technology Roadmap for Semiconductors, Moore's Law will end in 2021. How can we rethink computing technology to restart the historic explosive performance growth? Since 2012, the IEEE Rebooting Computing Initiative (IEEE RCI) has been working with industry and the US government to find new computing approaches to answer this question. In parallel, the CCC has held a number of workshops addressing similar questions. This whitepaper summarizes some of the IEEE RCI and CCC findings. The challenge for the US is to lead this new era of computing. Our international competitors are not sitting still: China has invested significantly in a variety of approaches such as neuromorphic computing, chip fabrication facilities, computer architecture, and high-performance simulation and data analytics computing, for example. We must act now, otherwise, the center of the computer industry will move from Silicon Valley and likely move off shore entirely. △ Less

Submitted 30 June, 2017; originally announced June 2017.

Comments: A Computing Community Consortium (CCC) white paper, 3 pages

arXiv:1611.03099 [pdf, ps, other]

A Brief Survey of Non-Residue Based Computational Error Correction

Authors: Sriseshan Srikanth, Bobin Deng, Thomas M. Conte

Abstract: The idea of computational error correction has been around for over half a century. The motivation has largely been to mitigate unreliable devices, manufacturing defects or harsh environments, primarily as a mandatory measure to preserve reliability, or more recently, as a means to lower energy by allowing soft errors to occasionally creep. While residue codes have shown great promise for this pur… ▽ More The idea of computational error correction has been around for over half a century. The motivation has largely been to mitigate unreliable devices, manufacturing defects or harsh environments, primarily as a mandatory measure to preserve reliability, or more recently, as a means to lower energy by allowing soft errors to occasionally creep. While residue codes have shown great promise for this purpose, there have been several orthogonal non-residue based techniques. In this article, we provide a high level outline of some of these non-residual approaches. △ Less

Submitted 9 November, 2016; originally announced November 2016.

Showing 1–10 of 10 results for author: Conte, T M