-
FeReX: A Reconfigurable Design of Multi-bit Ferroelectric Compute-in-Memory for Nearest Neighbor Search
Authors:
Zhicheng Xu,
Che-Kai Liu,
Chao Li,
Ruibin Mao,
Jianyi Yang,
Thomas Kämpfe,
Mohsen Imani,
Can Li,
Cheng Zhuo,
Xunzhao Yin
Abstract:
Rapid advancements in artificial intelligence have given rise to transformative models, profoundly impacting our lives. These models demand massive volumes of data to operate effectively, exacerbating the data-transfer bottleneck inherent in the conventional von-Neumann architecture. Compute-in-memory (CIM), a novel computing paradigm, tackles these issues by seamlessly embedding in-memory search…
▽ More
Rapid advancements in artificial intelligence have given rise to transformative models, profoundly impacting our lives. These models demand massive volumes of data to operate effectively, exacerbating the data-transfer bottleneck inherent in the conventional von-Neumann architecture. Compute-in-memory (CIM), a novel computing paradigm, tackles these issues by seamlessly embedding in-memory search functions, thereby obviating the need for data transfers. However, existing non-volatile memory (NVM)-based accelerators are application specific. During the similarity based associative search operation, they only support a single, specific distance metric, such as Hamming, Manhattan, or Euclidean distance in measuring the query against the stored data, calling for reconfigurable in-memory solutions adaptable to various applications. To overcome such a limitation, in this paper, we present FeReX, a reconfigurable associative memory (AM) that accommodates various distance metrics including Hamming, Manhattan, and Euclidean distances. Leveraging multi-bit ferroelectric field-effect transistors (FeFETs) as the proxy and a hardware-software co-design approach, we introduce a constrained satisfaction problem (CSP)-based method to automate AM search input voltage and stored voltage configurations for different distance based search functions. Device-circuit co-simulations first validate the effectiveness of the proposed FeReX methodology for reconfigurable search distance functions. Then, we benchmark FeReX in the context of k-nearest neighbor (KNN) and hyperdimensional computing (HDC), which highlights the robustness of FeReX and demonstrates up to 250x speedup and 10^4 energy savings compared with GPU.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Reconfigurable Frequency Multipliers Based on Complementary Ferroelectric Transistors
Authors:
Haotian Xu,
Jianyi Yang,
Cheng Zhuo,
Thomas Kämpfe,
Kai Ni,
Xunzhao Yin
Abstract:
Frequency multipliers, a class of essential electronic components, play a pivotal role in contemporary signal processing and communication systems. They serve as crucial building blocks for generating high-frequency signals by multiplying the frequency of an input signal. However, traditional frequency multipliers that rely on nonlinear devices often require energy- and area-consuming filtering an…
▽ More
Frequency multipliers, a class of essential electronic components, play a pivotal role in contemporary signal processing and communication systems. They serve as crucial building blocks for generating high-frequency signals by multiplying the frequency of an input signal. However, traditional frequency multipliers that rely on nonlinear devices often require energy- and area-consuming filtering and amplification circuits, and emerging designs based on an ambipolar ferroelectric transistor require costly non-trivial characteristic tuning or complex technology process. In this paper, we show that a pair of standard ferroelectric field effect transistors (FeFETs) can be used to build compact frequency multipliers without aforementioned technology issues. By leveraging the tunable parabolic shape of the 2FeFET structures' transfer characteristics, we propose four reconfigurable frequency multipliers, which can switch between signal transmission and frequency doubling. Furthermore, based on the 2FeFET structures, we propose four frequency multipliers that realize triple, quadruple frequency modes, elucidating a scalable methodology to generate more multiplication harmonics of the input frequency. Performance metrics such as maximum operating frequency, power, etc., are evaluated and compared with existing works. We also implement a practical case of frequency modulation scheme based on the proposed reconfigurable multipliers without additional devices. Our work provides a novel path of scalable and reconfigurable frequency multiplier designs based on devices that have characteristics similar to FeFETs, and show that FeFETs are a promising candidate for signal processing and communication systems in terms of maximum operating frequency and power.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
A Ferroelectric Compute-in-Memory Annealer for Combinatorial Optimization Problems
Authors:
Xunzhao Yin,
Yu Qian,
Alptekin Vardar,
Marcel Gunther,
Franz Muller,
Nellie Laleni,
Zijian Zhao,
Zhouhang Jiang,
Zhiguo Shi,
Yiyu Shi,
Xiao Gong,
Cheng Zhuo,
Thomas Kampfe,
Kai Ni
Abstract:
Computationally hard combinatorial optimization problems (COPs) are ubiquitous in many applications, including logistical planning, resource allocation, chip design, drug explorations, and more. Due to their critical significance and the inability of conventional hardware in efficiently handling scaled COPs, there is a growing interest in develo** computing hardware tailored specifically for COP…
▽ More
Computationally hard combinatorial optimization problems (COPs) are ubiquitous in many applications, including logistical planning, resource allocation, chip design, drug explorations, and more. Due to their critical significance and the inability of conventional hardware in efficiently handling scaled COPs, there is a growing interest in develo** computing hardware tailored specifically for COPs, including digital annealers, dynamical Ising machines, and quantum/photonic systems. However, significant hurdles still remain, such as the memory access issue, the system scalability and restricted applicability to certain types of COPs, and VLSI-incompatibility, respectively. Here, a ferroelectric field effect transistor (FeFET) based compute-in-memory (CiM) annealer is proposed. After converting COPs into quadratic unconstrained binary optimization (QUBO) formulations, a hardware-algorithm co-design is conducted, yielding an energy-efficient, versatile, and scalable hardware for COPs. To accelerate the core vector-matrix-vector (VMV) multiplication of QUBO formulations, a FeFET based CiM array is exploited, which can accelerate the intended operation in-situ due to its unique three-terminal structure. In particular, a lossless compression technique is proposed to prune typically sparse QUBO matrix to reduce hardware cost. Furthermore, a multi-epoch simulated annealing (MESA) algorithm is proposed to replace conventional simulated annealing for its faster convergence and better solution quality. The effectiveness of the proposed techniques is validated through the utilization of developed chip prototypes for successfully solving graph coloring problem, indicating great promise of FeFET CiM annealer in solving general COPs.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Embedding Security into Ferroelectric FET Array via In-Situ Memory Operation
Authors:
Yixin Xu,
Yi Xiao,
Zijian Zhao,
Franz Müller,
Alptekin Vardar,
Xiao Gong,
Sumitha George,
Thomas Kämpfe,
Vijaykrishnan Narayanan,
Kai Ni
Abstract:
Non-volatile memories (NVMs) have the potential to reshape next-generation memory systems because of their promising properties of near-zero leakage power consumption, high density and non-volatility. However, NVMs also face critical security threats that exploit the non-volatile property. Compared to volatile memory, the capability of retaining data even after power down makes NVM more vulnerable…
▽ More
Non-volatile memories (NVMs) have the potential to reshape next-generation memory systems because of their promising properties of near-zero leakage power consumption, high density and non-volatility. However, NVMs also face critical security threats that exploit the non-volatile property. Compared to volatile memory, the capability of retaining data even after power down makes NVM more vulnerable. Existing solutions to address the security issues of NVMs are mainly based on Advanced Encryption Standard (AES), which incurs significant performance and power overhead. In this paper, we propose a lightweight memory encryption/decryption scheme by exploiting in-situ memory operations with negligible overhead. To validate the feasibility of the encryption/decryption scheme, device-level and array-level experiments are performed using ferroelectric field effect transistor (FeFET) as an example NVM without loss of generality. Besides, a comprehensive evaluation is performed on a 128x128 FeFET AND-type memory array in terms of area, latency, power and throughput. Compared with the AES-based scheme, our scheme shows around 22.6x/14.1x increase in encryption/decryption throughput with negligible power penalty. Furthermore, we evaluate the performance of our scheme over the AES-based scheme when deploying different neural network workloads. Our scheme yields significant latency reduction by 90% on average for encryption and decryption processes.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
A Homogeneous Processing Fabric for Matrix-Vector Multiplication and Associative Search Using Ferroelectric Time-Domain Compute-in-Memory
Authors:
Xunzhao Yin,
Qingrong Huang,
Franz Müller,
Shan Deng,
Alptekin Vardar,
Sourav De,
Zhouhang Jiang,
Mohsen Imani,
Cheng Zhuo,
Thomas Kämpfe,
Kai Ni
Abstract:
In this work, we propose a ferroelectric FET(FeFET) time-domain compute-in-memory (TD-CiM) array as a homogeneous processing fabric for binary multiplication-accumulation (MAC) and content addressable memory (CAM). We demonstrate that: i) the XOR(XNOR)/AND logic function can be realized using a single cell composed of 2FeFETs connected in series; ii) a two-phase computation in an inverter chain wi…
▽ More
In this work, we propose a ferroelectric FET(FeFET) time-domain compute-in-memory (TD-CiM) array as a homogeneous processing fabric for binary multiplication-accumulation (MAC) and content addressable memory (CAM). We demonstrate that: i) the XOR(XNOR)/AND logic function can be realized using a single cell composed of 2FeFETs connected in series; ii) a two-phase computation in an inverter chain with each stage featuring the XOR/AND cell to control the associated capacitor loading and the computation results of binary MAC and CAM are reflected in the chain output signal delay, illustrating full digital compatibility; iii) comprehensive theoretical and experimental validation of the proposed 2FeFET cell and inverter delay chains and their robustness against FeFET variation; iv) the homogeneous processing fabric is applied in hyperdimensional computing to show dynamic and fine-grain resource allocation to accommodate different tasks requiring varying demands over the binary MAC and CAM resources.
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
Ferroelectric FET-based strong physical unclonable function: a low-power, high-reliable and reconfigurable solution for Internet-of-Things security
Authors:
Xinrui Guo,
Xiaoyang Ma,
Franz Muller,
Kai Ni,
Thomas Kampfe,
Yongpan Liu,
Vijaykrishnan Narayanan,
Xueqing Li
Abstract:
Hardware security has been a key concern in modern information technologies. Especially, as the number of Internet-of-Things (IoT) devices grows rapidly, to protect the device security with low-cost security primitives becomes essential, among which Physical Unclonable Function (PUF) is a widely-used solution. In this paper, we propose the first FeFET-based strong PUF exploiting the cycle-to-cycle…
▽ More
Hardware security has been a key concern in modern information technologies. Especially, as the number of Internet-of-Things (IoT) devices grows rapidly, to protect the device security with low-cost security primitives becomes essential, among which Physical Unclonable Function (PUF) is a widely-used solution. In this paper, we propose the first FeFET-based strong PUF exploiting the cycle-to-cycle (C2C) variation of FeFETs as the entropy source. Based on the experimental measurements, the proposed PUF shows satisfying performance including high uniformity, uniqueness, reconfigurability and reliability. To resist machine-learning attack, XOR structure was introduced, and simulations show that our proposed PUF has similar resistance to existing attack models with traditional arbiter PUFs. Furthermore, our design is shown to be power-efficient, and highly robust to write voltage, temperature and device size, which makes it a competitive security solution for Internet-of-Things edge devices.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.
-
An Ultra-Compact Single FeFET Binary and Multi-Bit Associative Search Engine
Authors:
Xunzhao Yin,
Franz Müller,
Qingrong Huang,
Chao Li,
Mohsen Imani,
Zeyu Yang,
Jiahao Cai,
Maximilian Lederer,
Ricardo Olivo,
Nellie Laleni,
Shan Deng,
Zijian Zhao,
Cheng Zhuo,
Thomas Kämpfe,
Kai Ni
Abstract:
Content addressable memory (CAM) is widely used in associative search tasks for its highly parallel pattern matching capability. To accommodate the increasingly complex and data-intensive pattern matching tasks, it is critical to keep improving the CAM density to enhance the performance and area efficiency. In this work, we demonstrate: i) a novel ultra-compact 1FeFET CAM design that enables paral…
▽ More
Content addressable memory (CAM) is widely used in associative search tasks for its highly parallel pattern matching capability. To accommodate the increasingly complex and data-intensive pattern matching tasks, it is critical to keep improving the CAM density to enhance the performance and area efficiency. In this work, we demonstrate: i) a novel ultra-compact 1FeFET CAM design that enables parallel associative search and in-memory hamming distance calculation; ii) a multi-bit CAM for exact search using the same CAM cell; iii) compact device designs that integrate the series resistor current limiter into the intrinsic FeFET structure to turn the 1FeFET1R into an effective 1FeFET cell; iv) a successful 2-step search operation and a sufficient sensing margin of the proposed binary and multi-bit 1FeFET1R CAM array with sizes of practical interests in both experiments and simulations, given the existing unoptimized FeFET device variation; v) 89.9x speedup and 66.5x energy efficiency improvement over the state-of-the art alignment tools on GPU in accelerating genome pattern matching applications through the hyperdimensional computing paradigm.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Deep Random Forest with Ferroelectric Analog Content Addressable Memory
Authors:
Xunzhao Yin,
Franz Müller,
Ann Franchesca Laguna,
Chao Li,
Wenwen Ye,
Qingrong Huang,
Qinming Zhang,
Zhiguo Shi,
Maximilian Lederer,
Nellie Laleni,
Shan Deng,
Zijian Zhao,
Michael Niemier,
Xiaobo Sharon Hu,
Cheng Zhuo,
Thomas Kämpfe,
Kai Ni
Abstract:
Deep random forest (DRF), which incorporates the core features of deep learning and random forest (RF), exhibits comparable classification accuracy, interpretability, and low memory and computational overhead when compared with deep neural networks (DNNs) in various information processing tasks for edge intelligence. However, the development of efficient hardware to accelerate DRF is lagging behin…
▽ More
Deep random forest (DRF), which incorporates the core features of deep learning and random forest (RF), exhibits comparable classification accuracy, interpretability, and low memory and computational overhead when compared with deep neural networks (DNNs) in various information processing tasks for edge intelligence. However, the development of efficient hardware to accelerate DRF is lagging behind its DNN counterparts. The key for hardware acceleration of DRF lies in efficiently realizing the branch-split operation at decision nodes when traversing a decision tree. In this work, we propose to implement DRF through simple associative searches realized with ferroelectric analog content addressable memory (ACAM). Utilizing only two ferroelectric field effect transistors (FeFETs), the ultra-compact ACAM cell can perform a branch-split operation with an energy-efficient associative search by storing the decision boundaries as the analog polarization states in an FeFET. The DRF accelerator architecture and the corresponding map** of the DRF model to the ACAM arrays are presented. The functionality, characteristics, and scalability of the FeFET ACAM based DRF and its robustness against FeFET device non-idealities are validated both in experiments and simulations. Evaluation results show that the FeFET ACAM DRF accelerator exhibits 10^6x/16x and 10^6x/2.5x improvements in terms of energy and latency when compared with other deep random forest hardware implementations on the state-of-the-art CPU/ReRAM, respectively.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
Alleviation of Temperature Variation Induced Accuracy Degradation in Ferroelectric FinFET Based Neural Network
Authors:
Sourav De,
Hoang-Hiep Le,
Md. Aftab Baig,
Yao-Jen Lee,
Darsen D. Lu,
Thomas Kämpfe
Abstract:
This paper reports the impacts of temperature variation on the inference accuracy of pre-trained all-ferroelectric FinFET deep neural networks, along with plausible design techniques to abate these impacts. We adopted a pre-trained artificial neural network (N.N.) with 96.4% inference accuracy on the MNIST dataset as the baseline. As an aftermath of temperature change, a compact model captured the…
▽ More
This paper reports the impacts of temperature variation on the inference accuracy of pre-trained all-ferroelectric FinFET deep neural networks, along with plausible design techniques to abate these impacts. We adopted a pre-trained artificial neural network (N.N.) with 96.4% inference accuracy on the MNIST dataset as the baseline. As an aftermath of temperature change, a compact model captured the conductance drift of a programmed cell over a wide range of gate biases. We observed a significant inference accuracy degradation in the analog neural network at 233 K for an N.N. trained at 300 K. Finally, we deployed binary neural networks with "read voltage" optimization to ensure immunity of N.N. to accuracy degradation under temperature variation, maintaining an inference accuracy of 96%. Keywords: Ferroelectric memories
△ Less
Submitted 15 August, 2022; v1 submitted 3 March, 2021;
originally announced March 2021.
-
In-Memory Nearest Neighbor Search with FeFET Multi-Bit Content-Addressable Memories
Authors:
Arman Kazemi,
Mohammad Mehdi Sharifi,
Ann Franchesca Laguna,
Franz Müller,
Ramin Rajaei,
Ricardo Olivo,
Thomas Kämpfe,
Michael Niemier,
X. Sharon Hu
Abstract:
Nearest neighbor (NN) search is an essential operation in many applications, such as one/few-shot learning and image classification. As such, fast and low-energy hardware support for accurate NN search is highly desirable. Ternary content-addressable memories (TCAMs) have been proposed to accelerate NN search for few-shot learning tasks by implementing $L_\infty$ and Hamming distance metrics, but…
▽ More
Nearest neighbor (NN) search is an essential operation in many applications, such as one/few-shot learning and image classification. As such, fast and low-energy hardware support for accurate NN search is highly desirable. Ternary content-addressable memories (TCAMs) have been proposed to accelerate NN search for few-shot learning tasks by implementing $L_\infty$ and Hamming distance metrics, but they cannot achieve software-comparable accuracies. This paper proposes a novel distance function that can be natively evaluated with multi-bit content-addressable memories (MCAMs) based on ferroelectric FETs (FeFETs) to perform a single-step, in-memory NN search. Moreover, this approach achieves accuracies comparable to floating-point precision implementations in software for NN classification and one/few-shot learning tasks. As an example, the proposed method achieves a 98.34% accuracy for a 5-way, 5-shot classification task for the Omniglot dataset (only 0.8% lower than software-based implementations) with a 3-bit MCAM. This represents a 13% accuracy improvement over state-of-the-art TCAM-based implementations at iso-energy and iso-delay. The presented distance function is resilient to the effects of FeFET device-to-device variations. Furthermore, this work experimentally demonstrates a 2-bit implementation of FeFET MCAM using AND arrays from GLOBALFOUNDRIES to further validate proof of concept.
△ Less
Submitted 13 November, 2020;
originally announced November 2020.
-
Direct S-matrix calculation for diffractive structures and metasurfaces
Authors:
Alexey A. Shcherbakov,
Yury V. Stebunov,
Denis F. Baidin,
Thomas Kampfe,
Yves Jourlin
Abstract:
The paper presents a derivation of analytical components of S-matrices for arbitrary planar diffractive structures and metasurfaces in the Fourier domain. Attained general formulas for S-matrix components can be applied within both formulations in the Cartesian and curvilinear metric. A numerical method based on these results can benefit from all previous improvements of the Fourier domain methods…
▽ More
The paper presents a derivation of analytical components of S-matrices for arbitrary planar diffractive structures and metasurfaces in the Fourier domain. Attained general formulas for S-matrix components can be applied within both formulations in the Cartesian and curvilinear metric. A numerical method based on these results can benefit from all previous improvements of the Fourier domain methods. In addition, we provide expressions for S-matrix calculation in case of periodically corrugated layers of 2D materials, which are valid for arbitrary corrugation depth-to-period ratios. As an example the derived equations are used to simulate resonant grating excitation of graphene plasmons and an impact of silica interlayer on corresponding reflection curves.
△ Less
Submitted 25 April, 2018; v1 submitted 22 December, 2017;
originally announced December 2017.
-
Measurement of Surface Acoustic Wave Resonances in Ferroelectric Domains by Microwave Microscopy
Authors:
Scott R. Johnston,
Yongliang Yang,
Yong-Tao Cui,
Eric Yue Ma,
Thomas Kämpfe,
Lukas M. Eng,
Jian Zhou,
Yan-Feng Chen,
Minghui Lu,
Zhi-Xun Shen
Abstract:
Surface Acoustic Wave (SAW) resonances were imaged within a closed domain in the ferroelectric LiTaO$_3$ via scanning Microwave Impedance Microscopy (MIM). The MIM probe is used for both SAW generation and measurement, allowing contact-less measurement within a mesoscopic structure. Measurements taken over a range of microwave frequencies are consistent with a constant acoustic velocity, demonstra…
▽ More
Surface Acoustic Wave (SAW) resonances were imaged within a closed domain in the ferroelectric LiTaO$_3$ via scanning Microwave Impedance Microscopy (MIM). The MIM probe is used for both SAW generation and measurement, allowing contact-less measurement within a mesoscopic structure. Measurements taken over a range of microwave frequencies are consistent with a constant acoustic velocity, demonstrating the acoustic nature of the measurement.
△ Less
Submitted 1 May, 2017;
originally announced May 2017.