Search | arXiv e-print repository

Chiplet-Gym: Optimizing Chiplet-based AI Accelerator Design with Reinforcement Learning

Abstract: Modern Artificial Intelligence (AI) workloads demand computing systems with large silicon area to sustain throughput and competitive performance. However, prohibitive manufacturing costs and yield limitations at advanced tech nodes and die-size reaching the reticle limit restrain us from achieving this. With the recent innovations in advanced packaging technologies, chiplet-based architectures hav… ▽ More Modern Artificial Intelligence (AI) workloads demand computing systems with large silicon area to sustain throughput and competitive performance. However, prohibitive manufacturing costs and yield limitations at advanced tech nodes and die-size reaching the reticle limit restrain us from achieving this. With the recent innovations in advanced packaging technologies, chiplet-based architectures have gained significant attention in the AI hardware domain. However, the vast design space of chiplet-based AI accelerator design and the absence of system and package-level co-design methodology make it difficult for the designer to find the optimum design point regarding Power, Performance, Area, and manufacturing Cost (PPAC). This paper presents Chiplet-Gym, a Reinforcement Learning (RL)-based optimization framework to explore the vast design space of chiplet-based AI accelerators, encompassing the resource allocation, placement, and packaging architecture. We analytically model the PPAC of the chiplet-based AI accelerator and integrate it into an OpenAI gym environment to evaluate the design points. We also explore non-RL-based optimization approaches and combine these two approaches to ensure the robustness of the optimizer. The optimizer-suggested design point achieves 1.52X throughput, 0.27X energy, and 0.01X die cost while incurring only 1.62X package cost of its monolithic counterpart at iso-area. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2402.10005 [pdf, other]

ML-ASPA: A Contemplation of Machine Learning-based Acoustic Signal Processing Analysis for Sounds, & Strains Emerging Technology

Authors: Ratul Ali, Aktarul Islam, Md. Shohel Rana, Saila Nasrin, Sohel Afzal Shajol, Professor Dr. A. H. M. Saifullah Sadi

Abstract: Acoustic data serves as a fundamental cornerstone in advancing scientific and engineering understanding across diverse disciplines, spanning biology, communications, and ocean and Earth science. This inquiry meticulously explores recent advancements and transformative potential within the domain of acoustics, specifically focusing on machine learning (ML) and deep learning. ML, comprising an exten… ▽ More Acoustic data serves as a fundamental cornerstone in advancing scientific and engineering understanding across diverse disciplines, spanning biology, communications, and ocean and Earth science. This inquiry meticulously explores recent advancements and transformative potential within the domain of acoustics, specifically focusing on machine learning (ML) and deep learning. ML, comprising an extensive array of statistical techniques, proves indispensable for autonomously discerning and leveraging patterns within data. In contrast to traditional acoustics and signal processing, ML adopts a data-driven approach, unveiling intricate relationships between features and desired labels or actions, as well as among features themselves, given ample training data. The application of ML to expansive sets of training data facilitates the discovery of models elucidating complex acoustic phenomena such as human speech and reverberation. The dynamic evolution of ML in acoustics yields compelling results and holds substantial promise for the future. The advent of electronic stethoscopes and analogous recording and data logging devices has expanded the application of acoustic signal processing concepts to the analysis of bowel sounds. This paper critically reviews existing literature on acoustic signal processing for bowel sound analysis, outlining fundamental approaches and applicable machine learning principles. It chronicles historical progress in signal processing techniques that have facilitated the extraction of valuable information from bowel sounds, emphasizing advancements in noise reduction, segmentation, signal enhancement, feature extraction, sound localization, and machine learning techniques... △ Less

Submitted 17 December, 2023; originally announced February 2024.

Comments: 7 pages, 5 figures, Article

MSC Class: 68Qxx; 68Uxx; 68Vxx; 68Wxx; 68Txx; 68-XX ACM Class: J.7; D.2; G.4

arXiv:2401.17911 [pdf, other]

SNNLP: Energy-Efficient Natural Language Processing Using Spiking Neural Networks

Authors: R. Alexander Knipper, Kaniz Mishty, Mehdi Sadi, Shubhra Kanti Karmaker Santu

Abstract: As spiking neural networks receive more attention, we look toward applications of this computing paradigm in fields other than computer vision and signal processing. One major field, underexplored in the neuromorphic setting, is Natural Language Processing (NLP), where most state-of-the-art solutions still heavily rely on resource-consuming and power-hungry traditional deep learning architectures.… ▽ More As spiking neural networks receive more attention, we look toward applications of this computing paradigm in fields other than computer vision and signal processing. One major field, underexplored in the neuromorphic setting, is Natural Language Processing (NLP), where most state-of-the-art solutions still heavily rely on resource-consuming and power-hungry traditional deep learning architectures. Therefore, it is compelling to design NLP models for neuromorphic architectures due to their low energy requirements, with the additional benefit of a more human-brain-like operating model for processing information. However, one of the biggest issues with bringing NLP to the neuromorphic setting is in properly encoding text into a spike train so that it can be seamlessly handled by both current and future SNN architectures. In this paper, we compare various methods of encoding text as spikes and assess each method's performance in an associated SNN on a downstream NLP task, namely, sentiment analysis. Furthermore, we go on to propose a new method of encoding text as spikes that outperforms a widely-used rate-coding technique, Poisson rate-coding, by around 13\% on our benchmark NLP tasks. Subsequently, we demonstrate the energy efficiency of SNNs implemented in hardware for the sentiment analysis task compared to traditional deep neural networks, observing an energy efficiency increase of more than 32x during inference and 60x during training while incurring the expected energy-performance tradeoff. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2309.15186 [pdf, other]

doi 10.1109/TCE.2023.3255411

AsQM: Audio streaming Quality Metric based on Network Impairments and User Preferences

Authors: Marcelo Rodrigo dos Santos, Andreza Patrícia Batista, Renata Lopes Rosa, Muhammad Saadi, Dick Carrillo Melgarejo, Demóstenes Zegarra Rodríguez

Abstract: There are many users of audio streaming services because of the proliferation of cloud-based audio streaming services for different content. The complex networks that support these services do not always guarantee an acceptable quality on the end-user side. In this paper, the impact of temporal interruptions on the reproduction of audio streaming and the users preference in relation to audio conte… ▽ More There are many users of audio streaming services because of the proliferation of cloud-based audio streaming services for different content. The complex networks that support these services do not always guarantee an acceptable quality on the end-user side. In this paper, the impact of temporal interruptions on the reproduction of audio streaming and the users preference in relation to audio contents are studied. In order to determine the key parameters in the audio streaming service, subjective tests were conducted, and their results show that users Quality-of-Experience (QoE) is highly correlated with the following application parameters, the number of temporal interruptions or stalls, its frequency and length, and the temporal location in which they occur. However, most important, experimental results demonstrated that users preference for audio content plays an important role in users QoE. Thus, a Preference Factor (PF) function is defined and considered in the formulation of the proposed metric named Audio streaming Quality Metric (AsQM). Considering that multimedia service providers are based on web servers, a framework to obtain user information is proposed. Furthermore, results show that the AsQM implemented in the audio player of an end users device presents a low impact on energy, processing and memory consumption. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: 11 pages

Journal ref: IEEE Transactions on Consumer Electronics, vol. 69, no. 3, pp. 408-420, Aug. 2023

arXiv:2303.12310 [pdf, other]

System and Design Technology Co-optimization of SOT-MRAM for High-Performance AI Accelerator Memory System

Authors: Kaniz Mishty, Mehdi Sadi

Abstract: SoCs are now designed with their own AI accelerator segment to accommodate the ever-increasing demand of Deep Learning (DL) applications. With powerful MAC engines for matrix multiplications, these accelerators show high computing performance. However, because of limited memory resources (i.e., bandwidth and capacity), they fail to achieve optimum system performance during large batch training and… ▽ More SoCs are now designed with their own AI accelerator segment to accommodate the ever-increasing demand of Deep Learning (DL) applications. With powerful MAC engines for matrix multiplications, these accelerators show high computing performance. However, because of limited memory resources (i.e., bandwidth and capacity), they fail to achieve optimum system performance during large batch training and inference. In this work, we propose a memory system with high on-chip capacity and bandwidth to shift the gear of AI accelerators from memory-bound to achieving system-level peak performance. We develop the memory system with DTCO-enabled customized SOT-MRAM as large on-chip memory through STCO and detailed characterization of the DL workloads. %We evaluate our workload-aware memory system on the CV and NLP benchmarks and observe significant PPA improvement compared to an SRAM-based in both inference and training modes. Our workload-aware memory system achieves 8X energy and 9X latency improvement on Computer Vision (CV) benchmarks in training and 8X energy and 4.5X latency improvement on Natural Language Processing (NLP) benchmarks in training while consuming only around 50% of SRAM area at iso-capacity. △ Less

Submitted 14 November, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2203.05965 [pdf, other]

Human-Like Navigation Behavior: A Statistical Evaluation Framework

Authors: Ian Colbert, Mehdi Saeedi

Abstract: Recent advancements in deep reinforcement learning have brought forth an impressive display of highly skilled artificial agents capable of complex intelligent behavior. In video games, these artificial agents are increasingly deployed as non-playable characters (NPCs) designed to enhance the experience of human players. However, while it has been shown that the convincing human-like behavior of NP… ▽ More Recent advancements in deep reinforcement learning have brought forth an impressive display of highly skilled artificial agents capable of complex intelligent behavior. In video games, these artificial agents are increasingly deployed as non-playable characters (NPCs) designed to enhance the experience of human players. However, while it has been shown that the convincing human-like behavior of NPCs leads to increased engagement in video games, the believability of an artificial agent's behavior is most often measured solely by its proficiency at a given task. Recent work has hinted that proficiency alone is not sufficient to discern human-like behavior. Motivated by this, we build a non-parametric two-sample hypothesis test designed to compare the behaviors of artificial agents to those of human players. We show that the resulting $p$-value not only aligns with anonymous human judgment of human-like behavior, but also that it can be used as a measure of similarity. △ Less

Submitted 9 March, 2022; originally announced March 2022.

arXiv:2111.09488 [pdf, other]

Attacking Deep Learning AI Hardware with Universal Adversarial Perturbation

Authors: Mehdi Sadi, B. M. S. Bahar Talukder, Kaniz Mishty, Md Tauhidur Rahman

Abstract: Universal Adversarial Perturbations are image-agnostic and model-independent noise that when added with any image can mislead the trained Deep Convolutional Neural Networks into the wrong prediction. Since these Universal Adversarial Perturbations can seriously jeopardize the security and integrity of practical Deep Learning applications, existing techniques use additional neural networks to detec… ▽ More Universal Adversarial Perturbations are image-agnostic and model-independent noise that when added with any image can mislead the trained Deep Convolutional Neural Networks into the wrong prediction. Since these Universal Adversarial Perturbations can seriously jeopardize the security and integrity of practical Deep Learning applications, existing techniques use additional neural networks to detect the existence of these noises at the input image source. In this paper, we demonstrate an attack strategy that when activated by rogue means (e.g., malware, trojan) can bypass these existing countermeasures by augmenting the adversarial noise at the AI hardware accelerator stage. We demonstrate the accelerator-level universal adversarial noise attack on several deep Learning models using co-simulation of the software kernel of Conv2D function and the Verilog RTL model of the hardware under the FuseSoC environment. △ Less

Submitted 17 November, 2021; originally announced November 2021.

arXiv:2104.02199 [pdf, other]

Designing Efficient and High-performance AI Accelerators with Customized STT-MRAM

Authors: Kaniz Mishty, Mehdi Sadi

Abstract: In this paper, we demonstrate the design of efficient and high-performance AI/Deep Learning accelerators with customized STT-MRAM and a reconfigurable core. Based on model-driven detailed design space exploration, we present the design methodology of an innovative scratchpad-assisted on-chip STT-MRAM based buffer system for high-performance accelerators. Using analytically derived expression of me… ▽ More In this paper, we demonstrate the design of efficient and high-performance AI/Deep Learning accelerators with customized STT-MRAM and a reconfigurable core. Based on model-driven detailed design space exploration, we present the design methodology of an innovative scratchpad-assisted on-chip STT-MRAM based buffer system for high-performance accelerators. Using analytically derived expression of memory occupancy time of AI model weights and activation maps, the volatility of STT-MRAM is adjusted with process and temperature variation aware scaling of thermal stability factor to optimize the retention time, energy, read/write latency, and area of STT-MRAM. From the analysis of modern AI workloads and accelerator implementation in 14nm technology, we verify the efficacy of our designed AI accelerator with STT-MRAM STT-AI. Compared to an SRAM-based implementation, the STT-AI accelerator achieves 75% area and 3% power savings at iso-accuracy. Furthermore, with a relaxed bit error rate and negligible AI accuracy trade-off, the designed STT-AI Ultra accelerator achieves 75.4%, and 3.5% savings in area and power, respectively over regular SRAM-based accelerators. △ Less

Submitted 5 April, 2021; originally announced April 2021.

arXiv:2104.00198 [pdf, other]

True Random Number Generation using Latency Variations of Commercial MRAM Chips

Authors: Farah Ferdaus, B. M. S. Bahar Talukder, Mehdi Sadi, Md Tauhidur Rahman

Abstract: The emerging magneto-resistive RAM (MRAM) has considerable potential to become a universal memory technology because of its several advantages: unlimited endurance, lower read/write latency, ultralow-power operation, high-density, and CMOS compatibility, etc. This paper will demonstrate an effective technique to generate random numbers from energy-efficient consumer-off-the-shelf (COTS) MRAM chips… ▽ More The emerging magneto-resistive RAM (MRAM) has considerable potential to become a universal memory technology because of its several advantages: unlimited endurance, lower read/write latency, ultralow-power operation, high-density, and CMOS compatibility, etc. This paper will demonstrate an effective technique to generate random numbers from energy-efficient consumer-off-the-shelf (COTS) MRAM chips. In the proposed scheme, the inherent (intrinsic/extrinsic process variation) stochastic switching behavior of magnetic tunnel junctions (MTJs) is exploited by manipulating the write latency of COTS MRAM chips. This is the first system-level experimental implementation of true random number generator (TRNG) using COTS toggle MRAM technology to the best of our knowledge. The experimental results and subsequent NIST SP-800-22 suite test reveal that the proposed latency-based TRNG is acceptably fast (~22 Mbit/s in the worst case) and robust over a wide range of operating conditions. △ Less

Submitted 31 March, 2021; originally announced April 2021.

Journal ref: 22nd International Symposium for Quality in Electronic Design (ISQED), CA, USA, 2021

arXiv:2103.12166 [pdf, other]

Special Session: Reliability Analysis for ML/AI Hardware

Authors: Shamik Kundu, Kanad Basu, Mehdi Sadi, Twisha Titirsha, Shihao Song, Anup Das, Ujjwal Guin

Abstract: Artificial intelligence (AI) and Machine Learning (ML) are becoming pervasive in today's applications, such as autonomous vehicles, healthcare, aerospace, cybersecurity, and many critical applications. Ensuring the reliability and robustness of the underlying AI/ML hardware becomes our paramount importance. In this paper, we explore and evaluate the reliability of different AI/ML hardware. The fir… ▽ More Artificial intelligence (AI) and Machine Learning (ML) are becoming pervasive in today's applications, such as autonomous vehicles, healthcare, aerospace, cybersecurity, and many critical applications. Ensuring the reliability and robustness of the underlying AI/ML hardware becomes our paramount importance. In this paper, we explore and evaluate the reliability of different AI/ML hardware. The first section outlines the reliability issues in a commercial systolic array-based ML accelerator in the presence of faults engendering from device-level non-idealities in the DRAM. Next, we quantified the impact of circuit-level faults in the MSB and LSB logic cones of the Multiply and Accumulate (MAC) block of the AI accelerator on the AI/ML accuracy. Finally, we present two key reliability issues -- circuit aging and endurance in emerging neuromorphic hardware platforms and present our system-level approach to mitigate them. △ Less

Submitted 29 March, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

Comments: To appear at VLSI Test Symposium

arXiv:2006.14419 [pdf]

A Novel and Reliable Deep Learning Web-Based Tool to Detect COVID-19 Infection from Chest CT-Scan

Authors: Abdolkarim Saeedi, Maryam Saeedi, Arash Maghsoudi

Abstract: The corona virus is already spread around the world in many countries, and it has taken many lives. Furthermore, the world health organization (WHO) has announced that COVID-19 has reached the global epidemic stage. Early and reliable diagnosis using chest CT-scan can assist medical specialists in vital circumstances. In this work, we introduce a computer aided diagnosis (CAD) web service to detec… ▽ More The corona virus is already spread around the world in many countries, and it has taken many lives. Furthermore, the world health organization (WHO) has announced that COVID-19 has reached the global epidemic stage. Early and reliable diagnosis using chest CT-scan can assist medical specialists in vital circumstances. In this work, we introduce a computer aided diagnosis (CAD) web service to detect COVID- 19 online. One of the largest public chest CT-scan databases, containing 746 participants was used in this experiment. A number of well-known deep neural network architectures consisting of ResNet, Inception and MobileNet were inspected to find the most efficient model for the hybrid system. A combination of the Densely connected convolutional network (DenseNet) in order to reduce image dimensions and Nu-SVM as an anti-overfitting bottleneck was chosen to distinguish between COVID-19 and healthy controls. The proposed methodology achieved 90.80% recall, 89.76% precision and 90.61% accuracy. The method also yields an AUC of 95.05%. Ultimately a flask web service is made public through ngrok using the trained models to provide a RESTful COVID-19 detector, which takes only 39 milliseconds to process one image. The source code is also available at https://github.com/KiLJ4EdeN/COVID_WEB. Based on the findings, it can be inferred that it is feasible to use the proposed technique as an automated tool for diagnosis of COVID-19. △ Less

Submitted 26 June, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

Comments: 9 pages, 8 figures, Improved English writing in the abstract, introduction and conclusion sections. Removed DenseNet typos underneath figures. Fixed some other minor typos

arXiv:2006.04798 [pdf, other]

Yield Loss Reduction and Test of AI and Deep Learning Accelerators

Authors: Mehdi Sadi, Ujjwal Guin

Abstract: With data-driven analytics becoming mainstream, the global demand for dedicated AI and Deep Learning accelerator chips is soaring. These accelerators, designed with densely packed Processing Elements (PE), are especially vulnerable to the manufacturing defects and functional faults common in the advanced semiconductor process nodes resulting in significant yield loss. In this work, we demonstrate… ▽ More With data-driven analytics becoming mainstream, the global demand for dedicated AI and Deep Learning accelerator chips is soaring. These accelerators, designed with densely packed Processing Elements (PE), are especially vulnerable to the manufacturing defects and functional faults common in the advanced semiconductor process nodes resulting in significant yield loss. In this work, we demonstrate an application-driven methodology of binning the AI accelerator chips, and yield loss reduction by correlating the circuit faults in the PEs of the accelerator with the desired accuracy of the target AI workload. We exploit the inherent fault tolerance features of trained deep learning models and a strategy of selective deactivation of faulty PEs to develop the presented yield loss reduction and test methodology. An analytical relationship is derived between fault location, fault rate, and the AI task's accuracy for deciding if the accelerator chip can pass the final yield test. A yield-loss reduction aware fault isolation, ATPG, and test flow are presented for the multiply and accumulate units of the PEs. Results obtained with widely used AI/deep learning benchmarks demonstrate that the accelerators can sustain 5% fault-rate in PE arrays while suffering from less than 1% accuracy loss, thus enabling product-binning and yield loss reduction of these chips. △ Less

Submitted 22 October, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

Comments: Pre-print of IEEE Transactions on CAD Journal

arXiv:1803.04102 [pdf, other]

doi 10.1109/TEST.2017.8242062

Hardware Trojan Detection through Information Flow Security Verification

Authors: Adib Nahiyan, Mehdi Sadi, Rahul Vittal, Gustavo Contreras, Domenic Forte, Mark Tehranipoor

Abstract: Semiconductor design houses are increasingly becoming dependent on third party vendors to procure intellectual property (IP) and meet time-to-market constraints. However, these third party IPs cannot be trusted as hardware Trojans can be maliciously inserted into them by untrusted vendors. While different approaches have been proposed to detect Trojans in third party IPs, their limitations have no… ▽ More Semiconductor design houses are increasingly becoming dependent on third party vendors to procure intellectual property (IP) and meet time-to-market constraints. However, these third party IPs cannot be trusted as hardware Trojans can be maliciously inserted into them by untrusted vendors. While different approaches have been proposed to detect Trojans in third party IPs, their limitations have not been extensively studied. In this paper, we analyze the limitations of the state-of-the-art Trojan detection techniques and demonstrate with experimental results how to defeat these detection mechanisms. We then propose a Trojan detection framework based on information flow security (IFS) verification. Our framework detects violation of IFS policies caused by Trojans without the need of white-box knowledge of the IP. We experimentally validate the efficacy of our proposed technique by accurately identifying Trojans in the trust-hub benchmarks. We also demonstrate that our technique does not share the limitations of the previously proposed Trojan detection techniques. △ Less

Submitted 11 March, 2018; originally announced March 2018.

Comments: 10 pages, 8 Figures

Journal ref: 2017 IEEE International Test Conference (ITC), Fort Worth, TX, 2017, pp. 1-10

arXiv:1604.03318 [pdf]

Applying Ontological Modeling on Quranic Nature Domain

Authors: A. B. M. Shamsuzzaman Sadi, Towfique Anam, Mohamed Abdirazak, Abdillahi Hasan Adnan, Sazid Zaman Khan, Mohamed Mahmudur Rahman, Ghassan Samara

Abstract: The holy Quran is the holy book of the Muslims. It contains information about many domains. Often people search for particular concepts of holy Quran based on the relations among concepts. An ontological modeling of holy Quran can be useful in such a scenario. In this paper, we have modeled nature related concepts of holy Quran using OWL (Web Ontology Language) / RDF (Resource Description Framewor… ▽ More The holy Quran is the holy book of the Muslims. It contains information about many domains. Often people search for particular concepts of holy Quran based on the relations among concepts. An ontological modeling of holy Quran can be useful in such a scenario. In this paper, we have modeled nature related concepts of holy Quran using OWL (Web Ontology Language) / RDF (Resource Description Framework). Our methodology involves identifying nature related concepts mentioned in holy Quran and identifying relations among those concepts. These concepts and relations are represented as classes/instances and properties of an OWL ontology. Later, in the result section it is shown that, using the Ontological model, SPARQL queries can retrieve verses and concepts of interest. Thus, this modeling helps semantic search and query on the holy Quran. In this work, we have used English translation of the holy Quran by Sahih International, Protege OWL Editor and for querying we have used SPARQL. △ Less

Submitted 12 April, 2016; originally announced April 2016.

Comments: 2016 7th International Conference on Information and Communication Systems (ICICS)

arXiv:1306.4409 [pdf, other]

Energy-Aware Scheme used in Multi-level Heterogeneous Wireless Sensor Networks

Authors: Mostafa Saadi, Moulay Lahcen Hasnaoui, Abderrahim Beni Hssane, Said Benkirane, Mohamed Laghdir

Abstract: The wireless sensor networks (WSNs) is a power constrained system, since nodes run on limited power batteries which shorten its lifespan.The main challenge facing us in the design and conception of Wireless Sensor Networks (WSNs) is to find the best way to extend their life span. The clustering algorithm is a key technique used to increase the scalability and life span of the network in general. I… ▽ More The wireless sensor networks (WSNs) is a power constrained system, since nodes run on limited power batteries which shorten its lifespan.The main challenge facing us in the design and conception of Wireless Sensor Networks (WSNs) is to find the best way to extend their life span. The clustering algorithm is a key technique used to increase the scalability and life span of the network in general. In this paper, we propose and evaluate a distributed energy-efficient clustering algorithm for WSNs. This heterogeneous-energy protocol is a new clustering algorithm to decrease probability of failure nodes and in which we introduce the node's remaining energy so as to determine the cluster heads. We study the impact of heterogeneity of nodes on WSNs that are hierarchically clustered. Finally, simulation results show that the proposed algorithm increases the life span of the whole network and performs better than LEACH and EEHC according to the metric:first node dies. △ Less

Submitted 18 June, 2013; originally announced June 2013.

Comments: 7 pages, 8 Figures

Journal ref: International Journal of Computer Science Issues (IJCSI), 10(1) : pages 96-102, 2013

arXiv:1304.7516 [pdf, ps, other]

Quantum Circuits for GCD Computation with $O(n \log n)$ Depth and O(n) Ancillae

Authors: Mehdi Saeedi, Igor L. Markov

Abstract: GCD computations and variants of the Euclidean algorithm enjoy broad uses in both classical and quantum algorithms. In this paper, we propose quantum circuits for GCD computation with $O(n \log n)$ depth with O(n) ancillae. Prior circuit construction needs $O(n^2)$ running time with O(n) ancillae. The proposed construction is based on the binary GCD algorithm and it benefits from log-depth circuit… ▽ More GCD computations and variants of the Euclidean algorithm enjoy broad uses in both classical and quantum algorithms. In this paper, we propose quantum circuits for GCD computation with $O(n \log n)$ depth with O(n) ancillae. Prior circuit construction needs $O(n^2)$ running time with O(n) ancillae. The proposed construction is based on the binary GCD algorithm and it benefits from log-depth circuits for 1-bit shift, comparison/subtraction, and managing ancillae. The worst-case gate count remains $O(n^2)$, as in traditional circuits. △ Less

Submitted 28 April, 2013; originally announced April 2013.

Comments: 5 pages, 6 figures, 1 table

arXiv:1304.0432 [pdf, ps, other]

Constant-Factor Optimization of Quantum Adders on 2D Quantum Architectures

Authors: Mehdi Saeedi, Alireza Shafaei, Massoud Pedram

Abstract: Quantum arithmetic circuits have practical applications in various quantum algorithms. In this paper, we address quantum addition on 2-dimensional nearest-neighbor architectures based on the work presented by Choi and Van Meter (JETC 2012). To this end, we propose new circuit structures for some basic blocks in the adder, and reduce communication overhead by adding concurrency to consecutive block… ▽ More Quantum arithmetic circuits have practical applications in various quantum algorithms. In this paper, we address quantum addition on 2-dimensional nearest-neighbor architectures based on the work presented by Choi and Van Meter (JETC 2012). To this end, we propose new circuit structures for some basic blocks in the adder, and reduce communication overhead by adding concurrency to consecutive blocks and also by parallel execution of expensive Toffoli gates. The proposed optimizations reduce total depth from $140\sqrt n+k_1$ to $92\sqrt n+k_2$ for constants $k_1,k_2$ and affect the computation fidelity considerably. △ Less

Submitted 1 April, 2013; originally announced April 2013.

Comments: 10 pages, 11 figures, 3 tables, Conference on Reversible Computation (2013)

arXiv:1303.3557 [pdf, ps, other]

doi 10.1103/PhysRevA.87.062318

Linear-Depth Quantum Circuits for n-qubit Toffoli gates with no Ancilla

Authors: Mehdi Saeedi, Massoud Pedram

Abstract: We design a circuit structure with linear depth to implement an $n$-qubit Toffoli gate. The proposed construction uses a quadratic-size circuit consists of elementary 2-qubit controlled-rotation gates around the x axis and uses no ancilla qubit. Circuit depth remains linear in quantum technologies with finite-distance interactions between qubits. The suggested construction is related to the long-s… ▽ More We design a circuit structure with linear depth to implement an $n$-qubit Toffoli gate. The proposed construction uses a quadratic-size circuit consists of elementary 2-qubit controlled-rotation gates around the x axis and uses no ancilla qubit. Circuit depth remains linear in quantum technologies with finite-distance interactions between qubits. The suggested construction is related to the long-standing construction by Barenco et al. (Phys. Rev. A, 52: 3457-3467, 1995, arXiv:quant-ph/9503016), which uses a quadratic-size, quadratic-depth quantum circuit for an $n$-qubit Toffoli gate. △ Less

Submitted 14 March, 2013; originally announced March 2013.

Comments: 5 pages, 7 figures

arXiv:1302.5382 [pdf, other]

Reversible Logic Synthesis by Quantum Rotation Gates

Authors: Afshin Abdollahi, Mehdi Saeedi, Massoud Pedram

Abstract: A rotation-based synthesis framework for reversible logic is proposed. We develop a canonical representation based on binary decision diagrams and introduce operators to manipulate the developed representation model. Furthermore, a recursive functional bi-decomposition approach is proposed to automatically synthesize a given function. While Boolean reversible logic is particularly addressed, our f… ▽ More A rotation-based synthesis framework for reversible logic is proposed. We develop a canonical representation based on binary decision diagrams and introduce operators to manipulate the developed representation model. Furthermore, a recursive functional bi-decomposition approach is proposed to automatically synthesize a given function. While Boolean reversible logic is particularly addressed, our framework constructs intermediate quantum states that may be in superposition, hence we combine techniques from reversible Boolean logic and quantum computation. The proposed approach results in quadratic gate count for multiple-control Toffoli gates without ancillae, linear depth for quantum carry-ripple adder, and quasilinear size for quantum multiplexer. △ Less

Submitted 23 March, 2013; v1 submitted 21 February, 2013; originally announced February 2013.

Comments: 19 pages, 17 figures

Journal ref: A. Abdollahi, M. Saeedi, and M. Pedram, "Reversible Logic Synthesis by Quantum Rotation Gates," Quantum Information and Computation, Vol. 13, No. 9-10, 2013

arXiv:1301.3210 [pdf, other]

doi 10.1103/PhysRevA.87.012310

Faster Quantum Number Factoring via Circuit Synthesis

Authors: Igor L. Markov, Mehdi Saeedi

Abstract: A major obstacle to implementing Shor's quantum number-factoring algorithm is the large size of modular-exponentiation circuits. We reduce this bottleneck by customizing reversible circuits for modular multiplication to individual runs of Shor's algorithm. Our circuit-synthesis procedure exploits spectral properties of multiplication operators and constructs optimized circuits from the traces of t… ▽ More A major obstacle to implementing Shor's quantum number-factoring algorithm is the large size of modular-exponentiation circuits. We reduce this bottleneck by customizing reversible circuits for modular multiplication to individual runs of Shor's algorithm. Our circuit-synthesis procedure exploits spectral properties of multiplication operators and constructs optimized circuits from the traces of the execution of an appropriate GCD algorithm. Empirically, gate counts are reduced by 4-5 times, and circuit latency is reduced by larger factors. △ Less

Submitted 14 January, 2013; originally announced January 2013.

Comments: 4 pages, 2 figures, 1 table

Journal ref: Phys. Rev. A, 87: 012310 (2013)

arXiv:1208.5425 [pdf, ps, other]

doi 10.1007/s11128-012-0482-8

Depth-Optimized Reversible Circuit Synthesis

Authors: Mona Arabzadeh, Morteza Saheb Zamani, Mehdi Sedighi, Mehdi Saeedi

Abstract: In this paper, simultaneous reduction of circuit depth and synthesis cost of reversible circuits in quantum technologies with limited interaction is addressed. We developed a cycle-based synthesis algorithm which uses negative controls and limited distance between gate lines. To improve circuit depth, a new parallel structure is introduced in which before synthesis a set of disjoint cycles are ext… ▽ More In this paper, simultaneous reduction of circuit depth and synthesis cost of reversible circuits in quantum technologies with limited interaction is addressed. We developed a cycle-based synthesis algorithm which uses negative controls and limited distance between gate lines. To improve circuit depth, a new parallel structure is introduced in which before synthesis a set of disjoint cycles are extracted from the input specification and distributed into some subsets. The cycles of each subset are synthesized independently on different sets of ancillae. Accordingly, each disjoint set can be synthesized by different synthesis methods. Our analysis shows that the best worst-case synthesis cost of reversible circuits in the linear nearest neighbor architecture is improved by the proposed approach. Our experimental results reveal the effectiveness of the proposed approach to reduce cost and circuit depth for several benchmarks. △ Less

Submitted 27 August, 2012; originally announced August 2012.

Comments: 13 pages, 6 figures, 5 tables; Quantum Information Processing (QINP) journal, 2012

Journal ref: Quantum Information Processing: Volume 12, Issue 4 (2013), Page 1677-1699

arXiv:1202.6614 [pdf, ps, other]

Constant-Optimized Quantum Circuits for Modular Multiplication and Exponentiation

Authors: Igor L. Markov, Mehdi Saeedi

Abstract: Reversible circuits for modular multiplication $Cx$%$M$ with $x<M$ arise as components of modular exponentiation in Shor's quantum number-factoring algorithm. However, existing generic constructions focus on asymptotic gate count and circuit depth rather than actual values, producing fairly large circuits not optimized for specific $C$ and $M$ values. In this work, we develop such optimizations in… ▽ More Reversible circuits for modular multiplication $Cx$%$M$ with $x<M$ arise as components of modular exponentiation in Shor's quantum number-factoring algorithm. However, existing generic constructions focus on asymptotic gate count and circuit depth rather than actual values, producing fairly large circuits not optimized for specific $C$ and $M$ values. In this work, we develop such optimizations in a bottom-up fashion, starting with most convenient $C$ values. When zero-initialized ancilla registers are available, we reduce the search for compact circuits to a shortest-path problem. Some of our modular-multiplication circuits are asymptotically smaller than previous constructions, but worst-case bounds and average sizes remain $Θ(n^2)$. In the context of modular exponentiation, we offer several constant-factor improvements, as well as an improvement by a constant additive term that is significant for few-qubit circuits arising in ongoing laboratory experiments with Shor's algorithm. △ Less

Submitted 2 April, 2015; v1 submitted 29 February, 2012; originally announced February 2012.

Comments: 29 pages, 9 tables, 19 figures. Minor change: fixed two typos in the abstract and body

Journal ref: Quantum Information and Computation, Vol. 12, No. 5&6, pp. 0361-0394, 2012

arXiv:1110.6412 [pdf, ps, other]

doi 10.1007/s11128-010-0201-2

Synthesis of Quantum Circuits for Linear Nearest Neighbor Architectures

Authors: Mehdi Saeedi, Robert Wille, Rolf Drechsler

Abstract: While a couple of impressive quantum technologies have been proposed, they have several intrinsic limitations which must be considered by circuit designers to produce realizable circuits. Limited interaction distance between gate qubits is one of the most common limitations. In this paper, we suggest extensions of the existing synthesis flow aimed to realize circuits for quantum architectures with… ▽ More While a couple of impressive quantum technologies have been proposed, they have several intrinsic limitations which must be considered by circuit designers to produce realizable circuits. Limited interaction distance between gate qubits is one of the most common limitations. In this paper, we suggest extensions of the existing synthesis flow aimed to realize circuits for quantum architectures with linear nearest neighbor (LNN) interaction. To this end, a template matching optimization, an exact synthesis approach, and two reordering strategies are introduced. The proposed methods are combined as an integrated synthesis flow. Experiments show that by using the suggested flow, quantum cost can be improved by more than 50% on average. △ Less

Submitted 5 September, 2012; v1 submitted 28 October, 2011; originally announced October 2011.

Comments: 14 pages, 11 figures, 3 tables

Journal ref: Quantum Information Processing, Vol. 10, No. 3, pp. 355-377, 2011

arXiv:1110.3969 [pdf]

An Efficient Approach towards Mitigating Soft Errors Risks

Authors: Muhammad Sheikh Sadi, Md. Mizanur Rahman Khan, Md. Nazim Uddin, Jan Jürjens

Abstract: Smaller feature size, higher clock frequency and lower power consumption are of core concerns of today's nano-technology, which has been resulted by continuous downscaling of CMOS technologies. The resultant 'device shrinking' reduces the soft error tolerance of the VLSI circuits, as very little energy is needed to change their states. Safety critical systems are very sensitive to soft errors. A b… ▽ More Smaller feature size, higher clock frequency and lower power consumption are of core concerns of today's nano-technology, which has been resulted by continuous downscaling of CMOS technologies. The resultant 'device shrinking' reduces the soft error tolerance of the VLSI circuits, as very little energy is needed to change their states. Safety critical systems are very sensitive to soft errors. A bit flip due to soft error can change the value of critical variable and consequently the system control flow can completely be changed which leads to system failure. To minimize soft error risks, a novel methodology is proposed to detect and recover from soft errors considering only 'critical code blocks' and 'critical variables' rather than considering all variables and/or blocks in the whole program. The proposed method shortens space and time overhead in comparison to existing dominant approaches. △ Less

Submitted 17 October, 2011; originally announced October 2011.

Comments: Signal & Image Processing: An International Journal(SIPIJ) Vol. 2, No. 3, September 2011

arXiv:1110.2574 [pdf, ps, other]

doi 10.1145/2431211.2431220

Synthesis and Optimization of Reversible Circuits - A Survey

Authors: Mehdi Saeedi, Igor L. Markov

Abstract: Reversible logic circuits have been historically motivated by theoretical research in low-power electronics as well as practical improvement of bit-manipulation transforms in cryptography and computer graphics. Recently, reversible circuits have attracted interest as components of quantum algorithms, as well as in photonic and nano-computing technologies where some switching devices offer no signa… ▽ More Reversible logic circuits have been historically motivated by theoretical research in low-power electronics as well as practical improvement of bit-manipulation transforms in cryptography and computer graphics. Recently, reversible circuits have attracted interest as components of quantum algorithms, as well as in photonic and nano-computing technologies where some switching devices offer no signal gain. Research in generating reversible logic distinguishes between circuit synthesis, post-synthesis optimization, and technology map**. In this survey, we review algorithmic paradigms --- search-based, cycle-based, transformation-based, and BDD-based --- as well as specific algorithms for reversible synthesis, both exact and heuristic. We conclude the survey by outlining key open challenges in synthesis of reversible and quantum logic, as well as most common misconceptions. △ Less

Submitted 20 March, 2013; v1 submitted 12 October, 2011; originally announced October 2011.

Comments: 34 pages, 15 figures, 2 tables

ACM Class: B.6.3; B.6.1

Journal ref: M. Saeedi and I. L. Markov, "Synthesis and Optimization of Reversible Circuits - A Survey", ACM Computing Surveys, 45, 2, Article 21 (34 pages), 2013

arXiv:1103.0215 [pdf, ps, other]

doi 10.1109/TCAD.2011.2105555

Reversible Circuit Optimization via Leaving the Boolean Domain

Authors: Dmitri Maslov, Mehdi Saeedi

Abstract: For years, the quantum/reversible circuit community has been convinced that: a) the addition of auxiliary qubits is instrumental in constructing a smaller quantum circuit; and, b) the introduction of quantum gates inside reversible circuits may result in more efficient designs. This paper presents a systematic approach to optimizing reversible (and quantum) circuits via the introduction of auxilia… ▽ More For years, the quantum/reversible circuit community has been convinced that: a) the addition of auxiliary qubits is instrumental in constructing a smaller quantum circuit; and, b) the introduction of quantum gates inside reversible circuits may result in more efficient designs. This paper presents a systematic approach to optimizing reversible (and quantum) circuits via the introduction of auxiliary qubits and quantum gates inside circuit designs. This advances our understanding of what may be achieved with a) and b). △ Less

Submitted 29 July, 2011; v1 submitted 1 March, 2011; originally announced March 2011.

Comments: 14 pages, 8 figures

ACM Class: B.6; B.6.3

Journal ref: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 30(6):806-816, 2011

arXiv:1011.2159 [pdf, other]

Block-based quantum-logic synthesis

Authors: Mehdi Saeedi, Mona Arabzadeh, Morteza Saheb Zamani, Mehdi Sedighi

Abstract: In this paper, the problem of constructing an efficient quantum circuit for the implementation of an arbitrary quantum computation is addressed. To this end, a basic block based on the cosine-sine decomposition method is suggested which contains $l$ qubits. In addition, a previously proposed quantum-logic synthesis method based on quantum Shannon decomposition is recursively applied to reach unita… ▽ More In this paper, the problem of constructing an efficient quantum circuit for the implementation of an arbitrary quantum computation is addressed. To this end, a basic block based on the cosine-sine decomposition method is suggested which contains $l$ qubits. In addition, a previously proposed quantum-logic synthesis method based on quantum Shannon decomposition is recursively applied to reach unitary gates over $l$ qubits. Then, the basic block is used and some optimizations are applied to remove redundant gates. It is shown that the exact value of $l$ affects the number of one-qubit and CNOT gates in the proposed method. In comparison to the previous synthesis methods, the value of $l$ is examined consequently to improve either the number of CNOT gates or the total number of gates. The proposed approach is further analyzed by considering the nearest neighbor limitation. According to our evaluation, the number of CNOT gates is increased by at most a factor of $\frac{5}{3}$ if the nearest neighbor interaction is applied. △ Less

Submitted 9 November, 2010; originally announced November 2010.

Comments: 15 pages, 8 figures, 5 tables, Quantum Information and Computation (QIC) Journal

arXiv:1004.4320 [pdf, other]

doi 10.1145/1877745.1877747

Reversible Circuit Synthesis Using a Cycle-Based Approach

Authors: Mehdi Saeedi, Morteza Saheb Zamani, Mehdi Sedighi, Zahra Sasanian

Abstract: Reversible logic has applications in various research areas including signal processing, cryptography and quantum computation. In this paper, direct NCT-based synthesis of a given $k$-cycle in a cycle-based synthesis scenario is examined. To this end, a set of seven building blocks is proposed that reveals the potential of direct synthesis of a given permutation to reduce both quantum cost and ave… ▽ More Reversible logic has applications in various research areas including signal processing, cryptography and quantum computation. In this paper, direct NCT-based synthesis of a given $k$-cycle in a cycle-based synthesis scenario is examined. To this end, a set of seven building blocks is proposed that reveals the potential of direct synthesis of a given permutation to reduce both quantum cost and average runtime. To synthesize a given large cycle, we propose a decomposition algorithm to extract the suggested building blocks from the input specification. Then, a synthesis method is introduced which uses the building blocks and the decomposition algorithm. Finally, a hybrid synthesis framework is suggested which uses the proposed cycle-based synthesis method in conjunction with one of the recent NCT-based synthesis approaches which is based on Reed-Muller (RM) spectra. The time complexity and the effectiveness of the proposed synthesis approach are analyzed in detail. Our analyses show that the proposed hybrid framework leads to a better quantum cost in the worst-case scenario compared to the previously presented methods. The proposed framework always converges and typically synthesizes a given specification very fast compared to the available synthesis algorithms. Besides, the quantum costs of benchmark functions are improved about 20% on average (55% in the best case). △ Less

Submitted 27 December, 2010; v1 submitted 24 April, 2010; originally announced April 2010.

Comments: 25 pages, 21 figures, 2 tables

Journal ref: ACM Journal of Emerging Technologies in Computing Systems, Vol. 6, Issue 4, Article 13, December 2010

arXiv:1004.1755 [pdf, other]

Rule-Based Optimization of Reversible Circuits

Authors: Mona Arabzadeh, Mehdi Saeedi, Morteza Saheb Zamani

Abstract: Reversible logic has applications in various research areas including low-power design and quantum computation. In this paper, a rule-based optimization approach for reversible circuits is proposed which uses both negative and positive control Toffoli gates during the optimization. To this end, a set of rules for removing NOT gates and optimizing sub-circuits with common-target gates are proposed.… ▽ More Reversible logic has applications in various research areas including low-power design and quantum computation. In this paper, a rule-based optimization approach for reversible circuits is proposed which uses both negative and positive control Toffoli gates during the optimization. To this end, a set of rules for removing NOT gates and optimizing sub-circuits with common-target gates are proposed. To evaluate the proposed approach, the best-reported synthesized circuits and the results of a recent synthesis algorithm which uses both negative and positive controls are used. Our experiments reveal the potential of the proposed approach in optimizing synthesized circuits. △ Less

Submitted 10 April, 2010; originally announced April 2010.

Comments: 12 pages, 15 figures, Asia and South Pacific Design Automation Conference, 2010

Journal ref: Mona Arabzadeh, Mehdi Saeedi, Morteza Saheb Zamani, "Rule-Based Optimization of Reversible Circuits," The 15th Asia and South Pacific Design Automation Conference (ASPDAC), pp. 849 - 854, 2010

arXiv:1004.1697 [pdf, other]

doi 10.1016/j.mejo.2010.02.002

A Library-Based Synthesis Methodology for Reversible Logic

Authors: Mehdi Saeedi, Mehdi Sedighi, Morteza Saheb Zamani

Abstract: In this paper, a library-based synthesis methodology for reversible circuits is proposed where a reversible specification is considered as a permutation comprising a set of cycles. To this end, a pre-synthesis optimization step is introduced to construct a reversible specification from an irreversible function. In addition, a cycle-based representation model is presented to be used as an intermedi… ▽ More In this paper, a library-based synthesis methodology for reversible circuits is proposed where a reversible specification is considered as a permutation comprising a set of cycles. To this end, a pre-synthesis optimization step is introduced to construct a reversible specification from an irreversible function. In addition, a cycle-based representation model is presented to be used as an intermediate format in the proposed synthesis methodology. The selected intermediate format serves as a focal point for all potential representation models. In order to synthesize a given function, a library containing seven building blocks is used where each building block is a cycle of length less than 6. To synthesize large cycles, we also propose a decomposition algorithm which produces all possible minimal and inequivalent factorizations for a given cycle of length greater than 5. All decompositions contain the maximum number of disjoint cycles. The generated decompositions are used in conjunction with a novel cycle assignment algorithm which is proposed based on the graph matching problem to select the best possible cycle pairs. Then, each pair is synthesized by using the available components of the library. The decomposition algorithm together with the cycle assignment method are considered as a binding method which selects a building block from the library for each cycle. Finally, a post-synthesis optimization step is introduced to optimize the synthesis results in terms of different costs. △ Less

Submitted 10 April, 2010; originally announced April 2010.

Comments: 24 pages, 8 figures, Microelectronics Journal, Elsevier

Journal ref: Mehdi Saeedi, Mehdi Sedighi, Morteza Saheb Zamani, ?A Library-Based Synthesis Methodology for Reversible Logic,? Microelectronics Journal, Elsevier, Volume 41, No. 4, pp. 185-194, 2010.

Showing 1–30 of 30 results for author: Saadi, M