-
QonFusion -- Quantum Approaches to Gaussian Random Variables: Applications in Stable Diffusion and Brownian Motion
Authors:
Shlomo Kashani
Abstract:
In the present study, we delineate a strategy focused on non-parametric quantum circuits for the generation of Gaussian random variables (GRVs). This quantum-centric approach serves as a substitute for conventional pseudorandom number generators (PRNGs), such as the \textbf{torch.rand} function in PyTorch. The principal theme of our research is the incorporation of Quantum Random Number Generators…
▽ More
In the present study, we delineate a strategy focused on non-parametric quantum circuits for the generation of Gaussian random variables (GRVs). This quantum-centric approach serves as a substitute for conventional pseudorandom number generators (PRNGs), such as the \textbf{torch.rand} function in PyTorch. The principal theme of our research is the incorporation of Quantum Random Number Generators (QRNGs) into classical models of diffusion. Notably, our Quantum Gaussian Random Variable Generator fulfills dual roles, facilitating simulations in both Stable Diffusion (SD) and Brownian Motion (BM). This diverges markedly from prevailing methods that utilize parametric quantum circuits (PQCs), often in conjunction with variational quantum eigensolvers (VQEs). Although conventional techniques can accurately approximate ground states in complex systems or model elaborate probability distributions, they require a computationally demanding optimization process to tune parameters. Our non-parametric strategy obviates this necessity. To facilitate assimilating our methodology into existing computational frameworks, we put forward QonFusion, a Python library congruent with both PyTorch and PennyLane, functioning as a bridge between classical and quantum computational paradigms. We validate QonFusion through extensive statistical testing, including tests which confirm the statistical equivalence of the Gaussian samples from our quantum approach to classical counterparts within defined significance limits. QonFusion is available at \url{https://boltzmannentropy.github.io/qonfusion.github.io/} to reproduce all findings here.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
HVOX: Scalable Interferometric Synthesis and Analysis of Spherical Sky Maps
Authors:
Sepand Kashani,
Joan Rué Queralt,
Adrian Jarret,
Matthieu Simeoni
Abstract:
Analysis and synthesis are key steps of the radio-interferometric imaging process, serving as a bridge between visibility and sky domains. They can be expressed as partial Fourier transforms involving a large number of non-uniform frequencies and spherically-constrained spatial coordinates. Due to the data non-uniformity, these partial Fourier transforms are computationally expensive and represent…
▽ More
Analysis and synthesis are key steps of the radio-interferometric imaging process, serving as a bridge between visibility and sky domains. They can be expressed as partial Fourier transforms involving a large number of non-uniform frequencies and spherically-constrained spatial coordinates. Due to the data non-uniformity, these partial Fourier transforms are computationally expensive and represent a serious bottleneck in the image reconstruction process. The W-gridding algorithm achieves log-linear complexity for both steps by applying a series of 2D non-uniform FFTs (NUFFT) to the data sliced along the so-called $w$ frequency coordinate. A major drawback of this method however is its restriction to direction-cosine meshes, which are fundamentally ill-suited for large field of views. This paper introduces the HVOX gridder, a novel algorithm for analysis/synthesis based on a 3D-NUFFT. Unlike W-gridding, the latter is compatible with arbitrary spherical meshes such as the popular HEALPix scheme for spherical data processing. The 3D-NUFFT allows one to optimally select the size of the inner FFTs, in particular the number of W-planes. This results in a better performing and auto-tuned algorithm, with controlled accuracy guarantees backed by strong results from approximation theory. To cope with the challenging scale of next-generation radio telescopes, we propose moreover a chunked evaluation strategy: by partitioning the visibility and sky domains, the 3D-NUFFT is decomposed into sub-problems which execute in parallel, while simultaneously cutting memory requirements. Our benchmarking results demonstrate the scalability of HVOX for both SKA and LOFAR, considering state-of-the-art challenging imaging setups. HVOX is moreover computationally competitive with W-gridder, despite the absence of domain-specific optimizations in our implementation.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
Manticore: Hardware-Accelerated RTL Simulation with Static Bulk-Synchronous Parallelism
Authors:
Mahyar Emami,
Sahand Kashani,
Keisuke Kamahori,
Mohammad Sepehr Pourghannad,
Ritik Raj,
James R. Larus
Abstract:
The demise of Moore's Law and Dennard Scaling has revived interest in specialized computer architectures and accelerators. Verification and testing of this hardware depend heavily upon cycle-accurate simulation of register-transfer-level (RTL) designs. The fastest software RTL simulators can simulate designs at 1--1000 kHz, i.e., more than three orders of magnitude slower than hardware. Improved s…
▽ More
The demise of Moore's Law and Dennard Scaling has revived interest in specialized computer architectures and accelerators. Verification and testing of this hardware depend heavily upon cycle-accurate simulation of register-transfer-level (RTL) designs. The fastest software RTL simulators can simulate designs at 1--1000 kHz, i.e., more than three orders of magnitude slower than hardware. Improved simulators can increase designers' productivity by speeding design iterations and permitting more exhaustive exploration. One possibility is to exploit low-level parallelism, as RTL expresses considerable fine-grain concurrency. Unfortunately, state-of-the-art RTL simulators often perform best on a single core since modern processors cannot effectively exploit fine-grain parallelism. This work presents Manticore: a parallel computer designed to accelerate RTL simulation. Manticore uses a static bulk-synchronous parallel (BSP) execution model to eliminate fine-grain synchronization overhead. It relies entirely on a compiler to schedule resources and communication, which is feasible since RTL code contains few divergent execution paths. With static scheduling, communication and synchronization no longer incur runtime overhead, making fine-grain parallelism practical. Moreover, static scheduling dramatically simplifies processor implementation, significantly increasing the number of cores that fit on a chip. Our 225-core FPGA implementation running at 475 MHz outperforms a state-of-the-art RTL simulator running on desktop and server computers in 8 out of 9 benchmarks.
△ Less
Submitted 20 October, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
LenslessPiCam: A Hardware and Software Platform for Lensless Computational Imaging with a Raspberry Pi
Authors:
Eric Bezzam,
Sepand Kashani,
Martin Vetterli,
Matthieu Simeoni
Abstract:
Lensless imaging seeks to replace/remove the lens in a conventional imaging system. The earliest cameras were in fact lensless, relying on long exposure times to form images on the other end of a small aperture in a darkened room/container (camera obscura). The introduction of a lens allowed for more light throughput and therefore shorter exposure times, while retaining sharp focus. The incorporat…
▽ More
Lensless imaging seeks to replace/remove the lens in a conventional imaging system. The earliest cameras were in fact lensless, relying on long exposure times to form images on the other end of a small aperture in a darkened room/container (camera obscura). The introduction of a lens allowed for more light throughput and therefore shorter exposure times, while retaining sharp focus. The incorporation of digital sensors readily enabled the use of computational imaging techniques to post-process and enhance raw images (e.g. via deblurring, inpainting, denoising, sharpening). Recently, imaging scientists have started leveraging computational imaging as an integral part of lensless imaging systems, allowing them to form viewable images from the highly multiplexed raw measurements of lensless cameras (see [5] and references therein for a comprehensive treatment of lensless imaging). This represents a real paradigm shift in camera system design as there is more flexibility to cater the hardware to the application at hand (e.g. lightweight or flat designs). This increased flexibility comes however at the price of a more demanding post-processing of the raw digital recordings and a tighter integration of sensing and computation, often difficult to achieve in practice due to inefficient interactions between the various communities of scientists involved. With LenslessPiCam, we provide an easily accessible hardware and software framework to enable researchers, hobbyists, and students to implement and explore practical and computational aspects of lensless imaging. We also provide detailed guides and exercises so that LenslessPiCam can be used as an educational resource, and point to results from our graduate-level signal processing course.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
A quantum Fourier transform (QFT) based note detection algorithm
Authors:
Shlomo Kashani,
Maryam Alqasemi,
Jacob Hammond
Abstract:
In quantum information processing (QIP), the quantum Fourier transform (QFT) has a plethora of applications [1] [2] [3]: Shor's algorithm and phase estimation are just a few well-known examples. Shor's quantum factorization algorithm, one of the most widely quoted quantum algorithms [4] [5] [6] relies heavily on the QFT and efficiently finds integer prime factors of large numbers on quantum comput…
▽ More
In quantum information processing (QIP), the quantum Fourier transform (QFT) has a plethora of applications [1] [2] [3]: Shor's algorithm and phase estimation are just a few well-known examples. Shor's quantum factorization algorithm, one of the most widely quoted quantum algorithms [4] [5] [6] relies heavily on the QFT and efficiently finds integer prime factors of large numbers on quantum computers [4]. This seminal ground-breaking design for quantum algorithms has triggered a cascade of viable alternatives to previously unsolvable problems on a classical computer that are potentially superior and can run in polynomial time. In this work we examine the QFT's structure and implementation for the creation of a quantum music note detection algorithm both on a simulated and a real quantum computer. Though formal approaches [7] [1] [8] [9] exist for the verification of quantum algorithms, in this study we limit ourselves to a simpler, symbolic representation which we validate using the symbolic SymPy [10] [11] package which symbolically replicates quantum computing processes. The algorithm is then implemented as a quantum circuit, using IBM's qiskit [12] library and finally period detection is exemplified on an actual single musical tone using a varying number of qubits.
△ Less
Submitted 30 April, 2022; v1 submitted 25 April, 2022;
originally announced April 2022.
-
Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI
Authors:
Shlomo Kashani,
Amir Ivry
Abstract:
The second edition of Deep Learning Interviews is home to hundreds of fully-solved problems, from a wide range of key topics in AI. It is designed to both rehearse interview or exam specific topics and provide machine learning MSc / PhD. students, and those awaiting an interview a well-organized overview of the field. The problems it poses are tough enough to cut your teeth on and to dramatically…
▽ More
The second edition of Deep Learning Interviews is home to hundreds of fully-solved problems, from a wide range of key topics in AI. It is designed to both rehearse interview or exam specific topics and provide machine learning MSc / PhD. students, and those awaiting an interview a well-organized overview of the field. The problems it poses are tough enough to cut your teeth on and to dramatically improve your skills-but they're framed within thought-provoking questions and engaging stories. That is what makes the volume so specifically valuable to students and job seekers: it provides them with the ability to speak confidently and quickly on any relevant topic, to answer technical questions clearly and correctly, and to fully understand the purpose and meaning of interview questions and answers. Those are powerful, indispensable advantages to have when walking into the interview room. The book's contents is a large inventory of numerous topics relevant to DL job interviews and graduate level exams. That places this work at the forefront of the growing trend in science to teach a core set of practical mathematical and computational skills. It is widely accepted that the training of every computer scientist must include the fundamental theorems of ML, and AI appears in the curriculum of nearly every university. This volume is designed as an excellent reference for graduates of such programs.
△ Less
Submitted 4 January, 2022; v1 submitted 30 December, 2021;
originally announced January 2022.
-
pyFFS: A Python Library for Fast Fourier Series Computation and Interpolation with GPU Acceleration
Authors:
Eric Bezzam,
Sepand Kashani,
Paul Hurley,
Martin Vetterli,
Matthieu Simeoni
Abstract:
Fourier transforms are an often necessary component in many computational tasks, and can be computed efficiently through the fast Fourier transform (FFT) algorithm. However, many applications involve an underlying continuous signal, and a more natural choice would be to work with e.g. the Fourier series (FS) coefficients in order to avoid the additional overhead of translating between the analog a…
▽ More
Fourier transforms are an often necessary component in many computational tasks, and can be computed efficiently through the fast Fourier transform (FFT) algorithm. However, many applications involve an underlying continuous signal, and a more natural choice would be to work with e.g. the Fourier series (FS) coefficients in order to avoid the additional overhead of translating between the analog and discrete domains. Unfortunately, there exists very little literature and tools for the manipulation of FS coefficients from discrete samples. This paper introduces a Python library called pyFFS for efficient FS coefficient computation, convolution, and interpolation. While the libraries SciPy and NumPy provide efficient functionality for discrete Fourier transform coefficients via the FFT algorithm, pyFFS addresses the computation of FS coefficients through what we call the fast Fourier series (FFS). Moreover, pyFFS includes an FS interpolation method based on the chirp Z-transform that can make it more than an order of magnitude faster than the SciPy equivalent when one wishes to perform interpolation. GPU support through the CuPy library allows for further acceleration, e.g. an order of magnitude faster for computing the 2-D FS coefficients of 1000 x 1000 samples and nearly two orders of magnitude faster for 2-D interpolation. As an application, we discuss the use of pyFFS in Fourier optics. pyFFS is available as an open source package at https://github.com/imagingofthings/pyFFS, with documentation at https://pyffs.readthedocs.io.
△ Less
Submitted 26 September, 2022; v1 submitted 1 October, 2021;
originally announced October 2021.