-
TornadoQSim: An Open-source High-Performance and Modular Quantum Circuit Simulation Framework
Authors:
Ales Kubicek,
Athanasios Stratikopoulos,
Juan Fumero,
Nikos Foutris,
Christos Kotselidis
Abstract:
In this article, we present TornadoQSim, an open-source quantum circuit simulation framework implemented in Java. The proposed framework has been designed to be modular and easily expandable for accommodating different user-defined simulation backends, such as the unitary matrix simulation technique. Furthermore, TornadoQSim features the ability to interchange simulation backends that can simulate…
▽ More
In this article, we present TornadoQSim, an open-source quantum circuit simulation framework implemented in Java. The proposed framework has been designed to be modular and easily expandable for accommodating different user-defined simulation backends, such as the unitary matrix simulation technique. Furthermore, TornadoQSim features the ability to interchange simulation backends that can simulate arbitrary quantum circuits. Another novel aspect of TornadoQSim over other quantum simulators is the transparent hardware acceleration of the simulation backends on heterogeneous devices. TornadoQSim employs TornadoVM to automatically compile parts of the simulation backends onto heterogeneous hardware, thereby addressing the fragmentation in development due to the low-level heterogeneous programming models. The evaluation of TornadoQSim has shown that the transparent utilization of GPU hardware can result in up to 506.5$x$ performance speedup when compared to the vanilla Java code for a fully entangled quantum circuit of 11 qubits. Other evaluated quantum algorithms have been the Deutsch-Jozsa algorithm (493.10$x$ speedup for a 11-qubit circuit) and the quantum Fourier transform algorithm (518.12$x$ speedup for a 11-qubit circuit). Finally, the best TornadoQSim implementation of unitary matrix has been evaluated against a semantically equivalent simulation via Qiskit. The comparative evaluation has shown that the simulation with TornadoQSim is faster for small circuits, while for large circuits Qiskit outperforms TornadoQSim by an order of magnitude.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Experiences in Building a Composable and Functional API for Runtime SPIR-V Code Generation
Authors:
Juan Fumero,
György Rethy,
Athanasios Stratikopoulos,
Nikos Foutris,
Christos Kotselidis
Abstract:
This paper presents the Beehive SPIR-V Toolkit; a framework that can automatically generate a Java composable and functional library for dynamically building SPIR-V binary modules. The Beehive SPIR-V Toolkit can be used by optimizing compilers and runtime systems to generate and validate SPIR-V binary modules from managed runtime systems, such as the Java Virtual Machine (JVM). Furthermore, our fr…
▽ More
This paper presents the Beehive SPIR-V Toolkit; a framework that can automatically generate a Java composable and functional library for dynamically building SPIR-V binary modules. The Beehive SPIR-V Toolkit can be used by optimizing compilers and runtime systems to generate and validate SPIR-V binary modules from managed runtime systems, such as the Java Virtual Machine (JVM). Furthermore, our framework is architected to accommodate new SPIR-V releases in an easy-to-maintain manner, and it facilitates the automatic generation of Java libraries for other standards, besides SPIR-V. The Beehive SPIR-V Toolkit also includes an assembler that emits SPIR-V binary modules from disassembled SPIR-V text files, and a disassembler that converts the SPIR-V binary code into a text file, and a console client application. To the best of our knowledge, the Beehive SPIR-V Toolkit is the first Java programming framework that can dynamically generate SPIR-V binary modules.
To demonstrate the use of our framework, we showcase the integration of the SPIR-V Beehive Toolkit in the context of the TornadoVM, a Java framework for automatically offloading and running Java programs on heterogeneous hardware. We show that, via the SPIR-V Beehive Toolkit, the TornadoVM is able to compile code 3x faster than its existing OpenCL C JIT compiler, and it performs up to 1.52x faster than the existing OpenCL C backend in TornadoVM.
△ Less
Submitted 18 May, 2023; v1 submitted 16 May, 2023;
originally announced May 2023.
-
Accelerating Java Ray Tracing Applications on Heterogeneous Hardware
Authors:
Vinh Pham Van,
Juan Fumero,
Athanasios Stratikopoulos,
Florin Blanaru,
Christos Kotselidis
Abstract:
Ray tracing has been typically known as a graphics rendering method capable of producing highly realistic imagery and visual effects generated by computers. More recently the performance improvements in Graphics Processing Units (GPUs) have enabled developers to exploit sufficient computing power to build a fair amount of ray tracing applications with the ability to run in real-time. Typically, re…
▽ More
Ray tracing has been typically known as a graphics rendering method capable of producing highly realistic imagery and visual effects generated by computers. More recently the performance improvements in Graphics Processing Units (GPUs) have enabled developers to exploit sufficient computing power to build a fair amount of ray tracing applications with the ability to run in real-time. Typically, real-time ray tracing is achieved by utilizing high performance kernels written in CUDA, OpenCL, and Vulkan which can be invoked by high-level languages via native bindings; a technique that fragments application code bases as well as limits portability.
This paper presents a hardware-accelerated ray tracing rendering engine, fully written in Java, that can seamlessly harness the performance of underlying GPUs via the TornadoVM framework. Through this paper, we show the potential of Java and acceleration frameworks to process in real time a compute intensive application. Our results indicate that it is possible to enable real time ray tracing from Java by achieving up to 234, 152, 45 frames-per-second in 720p, 1080p, and 4K resolutions, respectively.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs
Authors:
Michail Papadimitriou,
Juan Fumero,
Athanasios Stratikopoulos,
Foivos S. Zakkak,
Christos Kotselidis
Abstract:
In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The rationale behind this trend is that different parts of an application can be offloaded from the main CPU to diverse devices, which can efficientl…
▽ More
In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The rationale behind this trend is that different parts of an application can be offloaded from the main CPU to diverse devices, which can efficiently execute these parts as co-processors. FPGAs are a subset of the most widely used co-processors, typically used for accelerating specific workloads due to their flexible hardware and energy-efficient characteristics. These characteristics have made them prevalent in a broad spectrum of computing systems ranging from low-power embedded systems to high-end data centers and cloud infrastructures.
However, these hardware characteristics come at the cost of programmability. Developers who create their applications using high-level programming languages (e.g., Java, Python, etc.) are required to familiarize with a hardware description language (e.g., VHDL, Verilog) or recently heterogeneous programming models (e.g., OpenCL, HLS) in order to exploit the co-processors? capacity and tune the performance of their applications. Currently, the above-mentioned heterogeneous programming models support exclusively the compilation from compiled languages, such as C and C++. Thus, the transparent integration of heterogeneous co-processors to the software ecosystem of managed programming languages (e.g. Java, Python) is not seamless.
In this paper we rethink the engineering trade-offs that we encountered, in terms of transparency and compilation overheads, while integrating FPGAs into high-level managed programming languages. We present a novel approach that enables runtime code specialization techniques for seamless and high-performance execution of Java programs on FPGAs. The proposed solution is prototyped in the context of the Java programming language and TornadoVM; an open-source programming framework for Java execution on heterogeneous hardware. Finally, we evaluate the proposed solution for FPGA execution against both sequential and multi-threaded Java implementations showcasing up to 224x and 19.8x performance speedups, respectively, and up to 13.82x compared to TornadoVM running on an Intel integrated GPU. We also provide a break-down analysis of the proposed compiler optimizations for FPGA execution, as a means to project their impact on the applications? characteristics.
△ Less
Submitted 30 October, 2020;
originally announced October 2020.
-
Towards High Performance Java-based Deep Learning Frameworks
Authors:
Athanasios Stratikopoulos,
Juan Fumero,
Zoran Sevarac,
Christos Kotselidis
Abstract:
The advent of modern cloud services along with the huge volume of data produced on a daily basis, have set the demand for fast and efficient data processing. This demand is common among numerous application domains, such as deep learning, data mining, and computer vision. Prior research has focused on employing hardware accelerators as a means to overcome this inefficiency. This trend has driven s…
▽ More
The advent of modern cloud services along with the huge volume of data produced on a daily basis, have set the demand for fast and efficient data processing. This demand is common among numerous application domains, such as deep learning, data mining, and computer vision. Prior research has focused on employing hardware accelerators as a means to overcome this inefficiency. This trend has driven software development to target heterogeneous execution, and several modern computing systems have incorporated a mixture of diverse computing components, including GPUs and FPGAs. However, the specialization of the applications' code for heterogeneous execution is not a trivial task, as it requires developers to have hardware expertise in order to obtain high performance. The vast majority of the existing deep learning frameworks that support heterogeneous acceleration, rely on the implementation of wrapper calls from a high-level programming language to a low-level accelerator backend, such as OpenCL, CUDA or HLS.
In this paper we have employed TornadoVM, a state-of-the-art heterogeneous programming framework to transparently accelerate Deep Netts; a Java-based deep learning framework. Our initial results demonstrate up to 8x performance speedup when executing the back propagation process of the network's training on AMD GPUs against the sequential execution of the original Deep Netts framework.
△ Less
Submitted 13 January, 2020;
originally announced January 2020.