Search | arXiv e-print repository

All-to-all reconfigurability with sparse and higher-order Ising machines

Authors: Srijan Nikhar, Sidharth Kannan, Navid Anjum Aadit, Shuvro Chowdhury, Kerem Y. Camsari

Abstract: Domain-specific hardware to solve computationally hard optimization problems has generated tremendous excitement recently. Here, we evaluate probabilistic bit (p-bit) based on Ising Machines (IM) or p-computers with a benchmark combinatorial optimization problem, namely the 3-regular 3-XOR Satisfiability (3R3X). The 3R3X problem has a glassy energy landscape, and it has recently been used to bench… ▽ More Domain-specific hardware to solve computationally hard optimization problems has generated tremendous excitement recently. Here, we evaluate probabilistic bit (p-bit) based on Ising Machines (IM) or p-computers with a benchmark combinatorial optimization problem, namely the 3-regular 3-XOR Satisfiability (3R3X). The 3R3X problem has a glassy energy landscape, and it has recently been used to benchmark various IMs and other solvers. We introduce a multiplexed architecture where p-computers emulate all-to-all (complete) graph functionality despite being interconnected in sparse networks, enabling a highly parallelized chromatic Gibbs sampling. We implement this architecture in FPGAs and show that p-bit networks running an adaptive version of the powerful parallel tempering algorithm demonstrate competitive algorithmic and prefactor advantages over alternative IMs by D-Wave, Toshiba, and Fujitsu, except a greedy algorithm accelerated on a GPU. We further extend our APT results using higher-order interactions in FPGAs and show that while higher-order interactions lead to prefactor advantages, they do not show any algorithmic scaling advantages for the XORSAT problem, settling an open conjecture. Even though FPGA implementations of p-bits are still not quite as fast as the best possible greedy algorithms implemented in GPUs, scaled magnetic versions of p-computers could lead to orders of magnitude over such algorithms according to experimentally established projections. △ Less

Submitted 21 May, 2024; v1 submitted 21 November, 2023; originally announced December 2023.

Comments: First three authors are equally contributing

arXiv:2304.05949 [pdf, other]

doi 10.1038/s41467-024-46645-6

CMOS + stochastic nanomagnets: heterogeneous computers for probabilistic inference and learning

Authors: Nihal Sanjay Singh, Keito Kobayashi, Qixuan Cao, Kemal Selcuk, Tianrui Hu, Shaila Niazi, Navid Anjum Aadit, Shun Kanai, Hideo Ohno, Shunsuke Fukami, Kerem Y. Camsari

Abstract: Extending Moore's law by augmenting complementary-metal-oxide semiconductor (CMOS) transistors with emerging nanotechnologies (X) has become increasingly important. One important class of problems involve sampling-based Monte Carlo algorithms used in probabilistic machine learning, optimization, and quantum simulation. Here, we combine stochastic magnetic tunnel junction (sMTJ)-based probabilistic… ▽ More Extending Moore's law by augmenting complementary-metal-oxide semiconductor (CMOS) transistors with emerging nanotechnologies (X) has become increasingly important. One important class of problems involve sampling-based Monte Carlo algorithms used in probabilistic machine learning, optimization, and quantum simulation. Here, we combine stochastic magnetic tunnel junction (sMTJ)-based probabilistic bits (p-bits) with Field Programmable Gate Arrays (FPGA) to create an energy-efficient CMOS + X (X = sMTJ) prototype. This setup shows how asynchronously driven CMOS circuits controlled by sMTJs can perform probabilistic inference and learning by leveraging the algorithmic update-order-invariance of Gibbs sampling. We show how the stochasticity of sMTJs can augment low-quality random number generators (RNG). Detailed transistor-level comparisons reveal that sMTJ-based p-bits can replace up to 10,000 CMOS transistors while dissipating two orders of magnitude less energy. Integrated versions of our approach can advance probabilistic computing involving deep Boltzmann machines and other energy-based learning algorithms with extremely high throughput and energy efficiency. △ Less

Submitted 23 February, 2024; v1 submitted 12 April, 2023; originally announced April 2023.

Journal ref: Nature Communications volume 15, Article number: 2685 (2024)

arXiv:2303.10728 [pdf, other]

doi 10.1038/s41928-024-01182-4

Training Deep Boltzmann Networks with Sparse Ising Machines

Authors: Shaila Niazi, Navid Anjum Aadit, Masoud Mohseni, Shuvro Chowdhury, Yao Qin, Kerem Y. Camsari

Abstract: The slowing down of Moore's law has driven the development of unconventional computing paradigms, such as specialized Ising machines tailored to solve combinatorial optimization problems. In this paper, we show a new application domain for probabilistic bit (p-bit) based Ising machines by training deep generative AI models with them. Using sparse, asynchronous, and massively parallel Ising machine… ▽ More The slowing down of Moore's law has driven the development of unconventional computing paradigms, such as specialized Ising machines tailored to solve combinatorial optimization problems. In this paper, we show a new application domain for probabilistic bit (p-bit) based Ising machines by training deep generative AI models with them. Using sparse, asynchronous, and massively parallel Ising machines we train deep Boltzmann networks in a hybrid probabilistic-classical computing setup. We use the full MNIST and Fashion MNIST (FMNIST) dataset without any downsampling and a reduced version of CIFAR-10 dataset in hardware-aware network topologies implemented in moderately sized Field Programmable Gate Arrays (FPGA). For MNIST, our machine using only 4,264 nodes (p-bits) and about 30,000 parameters achieves the same classification accuracy (90%) as an optimized software-based restricted Boltzmann Machine (RBM) with approximately 3.25 million parameters. Similar results follow for FMNIST and CIFAR-10. Additionally, the sparse deep Boltzmann network can generate new handwritten digits and fashion products, a task the 3.25 million parameter RBM fails at despite achieving the same accuracy. Our hybrid computer takes a measured 50 to 64 billion probabilistic flips per second, which is at least an order of magnitude faster than superficially similar Graphics and Tensor Processing Unit (GPU/TPU) based implementations. The massively parallel architecture can comfortably perform the contrastive divergence algorithm (CD-n) with up to n = 10 million sweeps per update, beyond the capabilities of existing software implementations. These results demonstrate the potential of using Ising machines for traditionally hard-to-train deep generative Boltzmann networks, with further possible improvement in nanodevice-based realizations. △ Less

Submitted 23 January, 2024; v1 submitted 19 March, 2023; originally announced March 2023.

Journal ref: Nature Electronics (2024)

arXiv:2302.06457 [pdf, other]

doi 10.1109/JXCDC.2023.3256981

A full-stack view of probabilistic computing with p-bits: devices, architectures and algorithms

Authors: Shuvro Chowdhury, Andrea Grimaldi, Navid Anjum Aadit, Shaila Niazi, Masoud Mohseni, Shun Kanai, Hideo Ohno, Shunsuke Fukami, Luke Theogarajan, Giovanni Finocchio, Supriyo Datta, Kerem Y. Camsari

Abstract: The transistor celebrated its 75${}^\text{th}$ birthday in 2022. The continued scaling of the transistor defined by Moore's Law continues, albeit at a slower pace. Meanwhile, computing demands and energy consumption required by modern artificial intelligence (AI) algorithms have skyrocketed. As an alternative to scaling transistors for general-purpose computing, the integration of transistors with… ▽ More The transistor celebrated its 75${}^\text{th}$ birthday in 2022. The continued scaling of the transistor defined by Moore's Law continues, albeit at a slower pace. Meanwhile, computing demands and energy consumption required by modern artificial intelligence (AI) algorithms have skyrocketed. As an alternative to scaling transistors for general-purpose computing, the integration of transistors with unconventional technologies has emerged as a promising path for domain-specific computing. In this article, we provide a full-stack review of probabilistic computing with p-bits as a representative example of the energy-efficient and domain-specific computing movement. We argue that p-bits could be used to build energy-efficient probabilistic systems, tailored for probabilistic algorithms and applications. From hardware, architecture, and algorithmic perspectives, we outline the main applications of probabilistic computers ranging from probabilistic machine learning and AI to combinatorial optimization and quantum simulation. Combining emerging nanodevices with the existing CMOS ecosystem will lead to probabilistic computers with orders of magnitude improvements in energy efficiency and probabilistic sampling, potentially unlocking previously unexplored regimes for powerful probabilistic algorithms. △ Less

Submitted 16 March, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Journal ref: IEEE Journal on Exploratory Solid-State Computational Devices and Circuits (2023)

arXiv:2205.07402 [pdf, other]

doi 10.1109/NANO54668.2022.9928681

Physics-inspired Ising Computing with Ring Oscillator Activated p-bits

Authors: Navid Anjum Aadit, Andrea Grimaldi, Giovanni Finocchio, Kerem Y. Camsari

Abstract: The nearing end of Moore's Law has been driving the development of domain-specific hardware tailored to solve a special set of problems. Along these lines, probabilistic computing with inherently stochastic building blocks (p-bits) have shown significant promise, particularly in the context of hard optimization and statistical sampling problems. p-bits have been proposed and demonstrated in differ… ▽ More The nearing end of Moore's Law has been driving the development of domain-specific hardware tailored to solve a special set of problems. Along these lines, probabilistic computing with inherently stochastic building blocks (p-bits) have shown significant promise, particularly in the context of hard optimization and statistical sampling problems. p-bits have been proposed and demonstrated in different hardware substrates ranging from small-scale stochastic magnetic tunnel junctions (sMTJs) in asynchronous architectures to large-scale CMOS in synchronous architectures. Here, we design and implement a truly asynchronous and medium-scale p-computer (with $\approx$ 800 p-bits) that closely emulates the asynchronous dynamics of sMTJs in Field Programmable Gate Arrays (FPGAs). Using hard instances of the planted Ising glass problem on the Chimera lattice, we evaluate the performance of the asynchronous architecture against an ideal, synchronous design that performs parallelized (chromatic) exact Gibbs sampling. We find that despite the lack of any careful synchronization, the asynchronous design achieves parallelism with comparable algorithmic scaling in the ideal, carefully tuned and parallelized synchronous design. Our results highlight the promise of massively scaled p-computers with millions of free-running p-bits made out of nanoscale building blocks such as stochastic magnetic tunnel junctions. △ Less

Submitted 15 May, 2022; originally announced May 2022.

Comments: To appear in the 22nd IEEE International Conference on Nanotechnology (IEEE-NANO 2022)

Journal ref: 2022 IEEE 22nd International Conference on Nanotechnology (NANO)

arXiv:2201.12858 [pdf]

doi 10.1103/PhysRevApplied.17.024052

Spintronics-compatible approach to solving maximum satisfiability problems with probabilistic computing, invertible logic and parallel tempering

Authors: Andrea Grimaldi, Luis Sánchez-Tejerina1, Navid Anjum Aadit, Stefano Chiappini, Mario Carpentieri, Kerem Camsari, Giovanni Finocchio

Abstract: The search of hardware-compatible strategies for solving NP-hard combinatorial optimization problems (COPs) is an important challenge of today s computing research because of their wide range of applications in real world optimization problems. Here, we introduce an unconventional scalable approach to face maximum satisfiability problems (Max-SAT) which combines probabilistic computing with p-bits… ▽ More The search of hardware-compatible strategies for solving NP-hard combinatorial optimization problems (COPs) is an important challenge of today s computing research because of their wide range of applications in real world optimization problems. Here, we introduce an unconventional scalable approach to face maximum satisfiability problems (Max-SAT) which combines probabilistic computing with p-bits, parallel tempering, and the concept of invertible logic gates. We theoretically show the spintronic implementation of this approach based on a coupled set of Landau-Lifshitz-Gilbert equations, showing a potential path for energy efficient and very fast (p-bits exhibiting ns time scale switching) architecture for the solution of COPs. The algorithm is benchmarked with hard Max-SAT instances from the 2016 Max-SAT competition (e.g., HG-4SAT-V150-C1350-1.cnf which can be described with 2851 p-bits), including weighted Max-SAT and Max-Cut problems. △ Less

Submitted 30 January, 2022; originally announced January 2022.

Comments: 7 Figures, 20 pages

Journal ref: Phys. Rev. Applied 17, 024052 (2022)

arXiv:2110.02481 [pdf, other]

doi 10.1038/s41928-022-00774-2

Massively Parallel Probabilistic Computing with Sparse Ising Machines

Authors: Navid Anjum Aadit, Andrea Grimaldi, Mario Carpentieri, Luke Theogarajan, John M. Martinis, Giovanni Finocchio, Kerem Y. Camsari

Abstract: Inspired by the developments in quantum computing, building domain-specific classical hardware to solve computationally hard problems has received increasing attention. Here, by introducing systematic sparsification techniques, we demonstrate a massively parallel architecture: the sparse Ising Machine (sIM). Exploiting sparsity, sIM achieves ideal parallelism: its key figure of merit - flips per s… ▽ More Inspired by the developments in quantum computing, building domain-specific classical hardware to solve computationally hard problems has received increasing attention. Here, by introducing systematic sparsification techniques, we demonstrate a massively parallel architecture: the sparse Ising Machine (sIM). Exploiting sparsity, sIM achieves ideal parallelism: its key figure of merit - flips per second - scales linearly with the number of probabilistic bits (p-bit) in the system. This makes sIM up to 6 orders of magnitude faster than a CPU implementing standard Gibbs sampling. Compared to optimized implementations in TPUs and GPUs, sIM delivers 5-18x speedup in sampling. In benchmark problems such as integer factorization, sIM can reliably factor semiprimes up to 32-bits, far larger than previous attempts from D-Wave and other probabilistic solvers. Strikingly, sIM beats competition-winning SAT solvers (by 4-700x in runtime to reach 95% accuracy) in solving 3SAT problems. Even when sampling is made inexact using faster clocks, sIM can find the correct ground state with further speedup. The problem encoding and sparsification techniques we introduce can be applied to other Ising Machines (classical and quantum) and the architecture we present can be used for scaling the demonstrated 5,000-10,000 p-bits to 1,000,000 or more through analog CMOS or nanodevices. △ Less

Submitted 21 February, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Journal ref: Nature Electronics (2022)

Showing 1–7 of 7 results for author: Aadit, N A