Search | arXiv e-print repository

Throughput and Fairness Trade-off Balancing for UAV-Enabled Wireless Communication Systems

Authors: Kejie Ni, **gqing Wang, Wenchi Cheng, Wei Zhang

Abstract: Given the imperative of 6G networks' ubiquitous connectivity, along with the inherent mobility and cost-effectiveness of unmanned aerial vehicles (UAVs), UAVs play a critical role within 6G wireless networks. Despite advancements in enhancing the UAV-enabled communication systems' throughput in existing studies, there remains a notable gap in addressing issues concerning user fairness and quality-… ▽ More Given the imperative of 6G networks' ubiquitous connectivity, along with the inherent mobility and cost-effectiveness of unmanned aerial vehicles (UAVs), UAVs play a critical role within 6G wireless networks. Despite advancements in enhancing the UAV-enabled communication systems' throughput in existing studies, there remains a notable gap in addressing issues concerning user fairness and quality-of-service (QoS) provisioning and lacks an effective scheme to depict the trade-off between system throughput and user fairness. To solve the above challenges, in this paper we introduce a novel fairness control scheme for UAV-enabled wireless communication systems based on a new weighted function. First, we propose a throughput combining model based on a new weighted function with fairness considering. Second, we formulate the optimization problem to maximize the weighted sum of all users' throughput. Third, we decompose the optimization problem and propose an efficient iterative algorithm to solve it. Finally, simulation results are provided to demonstrate the considerable potential of our proposed scheme in fairness and QoS provisioning. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: submit to 2024 IEEE GLOBECOM

arXiv:2406.02833 [pdf, other]

DenoDet: Attention as Deformable Multi-Subspace Feature Denoising for Target Detection in SAR Images

Authors: Yimian Dai, Minrui Zou, Yuxuan Li, Xiang Li, Kang Ni, Jian Yang

Abstract: Synthetic Aperture Radar (SAR) target detection has long been impeded by inherent speckle noise and the prevalence of diminutive, ambiguous targets. While deep neural networks have advanced SAR target detection, their intrinsic low-frequency bias and static post-training weights falter with coherent noise and preserving subtle details across heterogeneous terrains. Motivated by traditional SAR ima… ▽ More Synthetic Aperture Radar (SAR) target detection has long been impeded by inherent speckle noise and the prevalence of diminutive, ambiguous targets. While deep neural networks have advanced SAR target detection, their intrinsic low-frequency bias and static post-training weights falter with coherent noise and preserving subtle details across heterogeneous terrains. Motivated by traditional SAR image denoising, we propose DenoDet, a network aided by explicit frequency domain transform to calibrate convolutional biases and pay more attention to high-frequencies, forming a natural multi-scale subspace representation to detect targets from the perspective of multi-subspace denoising. We design TransDeno, a dynamic frequency domain attention module that performs as a transform domain soft thresholding operation, dynamically denoising across subspaces by preserving salient target signals and attenuating noise. To adaptively adjust the granularity of subspace processing, we also propose a deformable group fully-connected layer (DeGroFC) that dynamically varies the group conditioned on the input features. Without bells and whistles, our plug-and-play TransDeno sets state-of-the-art scores on multiple SAR target detection datasets. The code is available at https://github.com/GrokCV/GrokSAR. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.04700 [pdf, other]

Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures

Authors: Ruiyang Qin, Zheyu Yan, Dewen Zeng, Zhenge Jia, Dancheng Liu, Jianbo Liu, Zhi Zheng, Ningyuan Cao, Kai Ni, **jun Xiong, Yiyu Shi

Abstract: Large Language Models (LLMs) deployed on edge devices learn through fine-tuning and updating a certain portion of their parameters. Although such learning methods can be optimized to reduce resource utilization, the overall required resources remain a heavy burden on edge devices. Instead, Retrieval-Augmented Generation (RAG), a resource-efficient LLM learning method, can improve the quality of th… ▽ More Large Language Models (LLMs) deployed on edge devices learn through fine-tuning and updating a certain portion of their parameters. Although such learning methods can be optimized to reduce resource utilization, the overall required resources remain a heavy burden on edge devices. Instead, Retrieval-Augmented Generation (RAG), a resource-efficient LLM learning method, can improve the quality of the LLM-generated content without updating model parameters. However, the RAG-based LLM may involve repetitive searches on the profile data in every user-LLM interaction. This search can lead to significant latency along with the accumulation of user data. Conventional efforts to decrease latency result in restricting the size of saved user data, thus reducing the scalability of RAG as user data continuously grows. It remains an open question: how to free RAG from the constraints of latency and scalability on edge devices? In this paper, we propose a novel framework to accelerate RAG via Computing-in-Memory (CiM) architectures. It accelerates matrix multiplications by performing in-situ computation inside the memory while avoiding the expensive data transfer between the computing unit and memory. Our framework, Robust CiM-backed RAG (RoCR), utilizing a novel contrastive learning-based training method and noise-aware training, can enable RAG to efficiently search profile data with CiM. To the best of our knowledge, this is the first work utilizing CiM to accelerate RAG. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.14316 [pdf, other]

Automated Long Answer Grading with RiceChem Dataset

Authors: Shashank Sonkar, Kangqi Ni, Lesa Tran Lu, Kristi Kincaid, John S. Hutchinson, Richard G. Baraniuk

Abstract: We introduce a new area of study in the field of educational Natural Language Processing: Automated Long Answer Grading (ALAG). Distinguishing itself from Automated Short Answer Grading (ASAG) and Automated Essay Grading (AEG), ALAG presents unique challenges due to the complexity and multifaceted nature of fact-based long answers. To study ALAG, we introduce RiceChem, a dataset derived from a col… ▽ More We introduce a new area of study in the field of educational Natural Language Processing: Automated Long Answer Grading (ALAG). Distinguishing itself from Automated Short Answer Grading (ASAG) and Automated Essay Grading (AEG), ALAG presents unique challenges due to the complexity and multifaceted nature of fact-based long answers. To study ALAG, we introduce RiceChem, a dataset derived from a college chemistry course, featuring real student responses to long-answer questions with an average word count notably higher than typical ASAG datasets. We propose a novel approach to ALAG by formulating it as a rubric entailment problem, employing natural language inference models to verify whether each criterion, represented by a rubric item, is addressed in the student's response. This formulation enables the effective use of MNLI for transfer learning, significantly improving the performance of models on the RiceChem dataset. We demonstrate the importance of rubric-based formulation in ALAG, showcasing its superiority over traditional score-based approaches in capturing the nuances of student responses. We also investigate the performance of models in cold start scenarios, providing valuable insights into the practical deployment considerations in educational settings. Lastly, we benchmark state-of-the-art open-sourced Large Language Models (LLMs) on RiceChem and compare their results to GPT models, highlighting the increased complexity of ALAG compared to ASAG. Despite leveraging the benefits of a rubric-based approach and transfer learning from MNLI, the lower performance of LLMs on RiceChem underscores the significant difficulty posed by the ALAG task. With this work, we offer a fresh perspective on grading long, fact-based answers and introduce a new dataset to stimulate further research in this important area. Code: \url{https://github.com/luffycodes/Automated-Long-Answer-Grading}. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.00196 [pdf, other]

Combined Static Analysis and Machine Learning Prediction for Application Debloating

Authors: Chris Porter, Sharjeel Khan, Kangqi Ni, Santosh Pande

Abstract: Software debloating can effectively thwart certain code reuse attacks by reducing attack surfaces to break gadget chains. Approaches based on static analysis enable a reduced set of functions reachable at a callsite for execution by leveraging static properties of the callgraph. This achieves low runtime overhead, but the function set is conservatively computed, negatively affecting reduction. In… ▽ More Software debloating can effectively thwart certain code reuse attacks by reducing attack surfaces to break gadget chains. Approaches based on static analysis enable a reduced set of functions reachable at a callsite for execution by leveraging static properties of the callgraph. This achieves low runtime overhead, but the function set is conservatively computed, negatively affecting reduction. In contrast, approaches based on machine learning (ML) have much better precision and can sharply reduce function sets, leading to significant improvement in attack surface. Nevertheless, mispredictions occur in ML-based approaches. These cause overheads, and worse, there is no clear way to distinguish between mispredictions and actual attacks. In this work, we contend that a software debloating approach that incorporates ML-based predictions at runtime is realistic in a whole application setting, and that it can achieve significant attack surface reductions beyond the state of the art. We develop a framework, Predictive Debloat with Static Guarantees (PDSG). PDSG is fully sound and works on application source code. At runtime it predicts the dynamic callee set emanating from a callsite, and to resolve mispredictions, it employs a lightweight audit based on static invariants of call chains. We deduce the invariants offline and assert that they hold at runtime when there is a misprediction. To the best of our knowledge, it achieves the highest gadget reductions among similar techniques on SPEC CPU 2017, reducing 82.5% of the total gadgets on average. It triggers misprediction checks on only 3.8% of the total predictions invoked at runtime, and it leverages Datalog to verify dynamic call sequences conform to the static call relations. It has an overhead of 8.9%, which makes the scheme attractive for practical deployments. △ Less

Submitted 29 March, 2024; originally announced April 2024.

arXiv:2403.04981 [pdf, other]

Paving the Way for Pass Disturb Free Vertical NAND Storage via A Dedicated and String-Compatible Pass Gate

Authors: Zijian Zhao, Sola Woo, Khandker Akif Aabrar, Sharadindu Gopal Kirtania, Zhouhang Jiang, Shan Deng, Yi Xiao, Halid Mulaosmanovic, Stefan Duenkel, Dominik Kleimaier, Steven Soss, Sven Beyer, Rajiv Joshi, Scott Meninger, Mohamed Mohamed, Kijoon Kim, Jongho Woo, Suhwan Lim, Kwangsoo Kim, Wanki Kim, Daewon Ha, Vijaykrishnan Narayanan, Suman Datta, Shimeng Yu, Kai Ni

Abstract: In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-… ▽ More In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-${V}_{TH}$ (LVT) state; ii) combined simulations and experimental demonstrations of dual-port design verify the disturb-free operation in a NAND string, overcoming a key challenge in single-port designs; iii) the proposed design can be incorporated in a highly scaled vertical NAND FeFET string and the pass gate can be incorporated into the existing 3D NAND with the negligible overhead of the pass gate interconnection through a global bottom pass gate contact in the substrate. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 29 pages, 7 figures

arXiv:2402.05000 [pdf, other]

Pedagogical Alignment of Large Language Models

Authors: Shashank Sonkar, Kangqi Ni, Sapana Chaudhary, Richard G. Baraniuk

Abstract: In this paper, we introduce the novel concept of pedagogically aligned Large Language Models (LLMs) that signifies a transformative shift in the application of LLMs within educational contexts. Rather than providing direct responses to user queries, pedagogically-aligned LLMs function as scaffolding tools, breaking complex problems into manageable subproblems and guiding students towards the final… ▽ More In this paper, we introduce the novel concept of pedagogically aligned Large Language Models (LLMs) that signifies a transformative shift in the application of LLMs within educational contexts. Rather than providing direct responses to user queries, pedagogically-aligned LLMs function as scaffolding tools, breaking complex problems into manageable subproblems and guiding students towards the final answer through constructive feedback and hints. The objective is to equip learners with problem-solving strategies that deepen their understanding and internalization of the subject matter. Previous research in this field has primarily applied the supervised finetuning approach without framing the objective as an alignment problem, hence not employing reinforcement learning through human feedback (RLHF) methods. This study reinterprets the narrative by viewing the task through the lens of alignment and demonstrates how RLHF methods emerge naturally as a superior alternative for aligning LLM behaviour. Building on this perspective, we propose a novel approach for constructing a reward dataset specifically designed for the pedagogical alignment of LLMs. We apply three state-of-the-art RLHF algorithms and find that they outperform SFT significantly. Our qualitative analyses across model differences and hyperparameter sensitivity further validate the superiority of RLHF over SFT. Also, our study sheds light on the potential of online feedback for enhancing the performance of pedagogically-aligned LLMs, thus providing valuable insights for the advancement of these models in educational settings. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2312.17444 [pdf, other]

Reconfigurable Frequency Multipliers Based on Complementary Ferroelectric Transistors

Authors: Haotian Xu, Jianyi Yang, Cheng Zhuo, Thomas Kämpfe, Kai Ni, Xunzhao Yin

Abstract: Frequency multipliers, a class of essential electronic components, play a pivotal role in contemporary signal processing and communication systems. They serve as crucial building blocks for generating high-frequency signals by multiplying the frequency of an input signal. However, traditional frequency multipliers that rely on nonlinear devices often require energy- and area-consuming filtering an… ▽ More Frequency multipliers, a class of essential electronic components, play a pivotal role in contemporary signal processing and communication systems. They serve as crucial building blocks for generating high-frequency signals by multiplying the frequency of an input signal. However, traditional frequency multipliers that rely on nonlinear devices often require energy- and area-consuming filtering and amplification circuits, and emerging designs based on an ambipolar ferroelectric transistor require costly non-trivial characteristic tuning or complex technology process. In this paper, we show that a pair of standard ferroelectric field effect transistors (FeFETs) can be used to build compact frequency multipliers without aforementioned technology issues. By leveraging the tunable parabolic shape of the 2FeFET structures' transfer characteristics, we propose four reconfigurable frequency multipliers, which can switch between signal transmission and frequency doubling. Furthermore, based on the 2FeFET structures, we propose four frequency multipliers that realize triple, quadruple frequency modes, elucidating a scalable methodology to generate more multiplication harmonics of the input frequency. Performance metrics such as maximum operating frequency, power, etc., are evaluated and compared with existing works. We also implement a practical case of frequency modulation scheme based on the proposed reconfigurable multipliers without additional devices. Our work provides a novel path of scalable and reconfigurable frequency multiplier designs based on devices that have characteristics similar to FeFETs, and show that FeFETs are a promising candidate for signal processing and communication systems in terms of maximum operating frequency and power. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 6 pages, 8 figures, 1 table. Accepted by Design Automation and Test in Europe (DATE) 2024

arXiv:2312.17442 [pdf, other]

Low Power and Temperature-Resilient Compute-In-Memory Based on Subthreshold-FeFET

Authors: Yifei Zhou, Xuchu Huang, Jianyi Yang, Kai Ni, Hussam Amrouch, Cheng Zhuo, Xunzhao Yin

Abstract: Compute-in-memory (CiM) is a promising solution for addressing the challenges of artificial intelligence (AI) and the Internet of Things (IoT) hardware such as 'memory wall' issue. Specifically, CiM employing nonvolatile memory (NVM) devices in a crossbar structure can efficiently accelerate multiply-accumulation (MAC) computation, a crucial operator in neural networks among various AI models. Low… ▽ More Compute-in-memory (CiM) is a promising solution for addressing the challenges of artificial intelligence (AI) and the Internet of Things (IoT) hardware such as 'memory wall' issue. Specifically, CiM employing nonvolatile memory (NVM) devices in a crossbar structure can efficiently accelerate multiply-accumulation (MAC) computation, a crucial operator in neural networks among various AI models. Low power CiM designs are thus highly desired for further energy efficiency optimization on AI models. Ferroelectric FET (FeFET), an emerging device, is attractive for building ultra-low power CiM array due to CMOS compatibility, high ION/IOFF ratio, etc. Recent studies have explored FeFET based CiM designs that achieve low power consumption. Nevertheless, subthreshold-operated FeFETs, where the operating voltages are scaled down to the subthreshold region to reduce array power consumption, are particularly vulnerable to temperature drift, leading to accuracy degradation. To address this challenge, we propose a temperature-resilient 2T-1FeFET CiM design that performs MAC operations reliably at subthreahold region from 0 to 85 Celsius, while consuming ultra-low power. Benchmarked against the VGG neural network architecture running the CIFAR-10 dataset, the proposed 2T-1FeFET CiM design achieves 89.45% CIFAR-10 test accuracy. Compared to previous FeFET based CiM designs, it exhibits immunity to temperature drift at an 8-bit wordlength scale, and achieves better energy efficiency with 2866 TOPS/W. △ Less

Submitted 10 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: 6 pages, 9 figures, 2 tables. Accepted by Design Automation and Test in Europe (DATE) 2024

arXiv:2312.15444 [pdf, other]

Variation-Resilient FeFET-Based In-Memory Computing Leveraging Probabilistic Deep Learning

Authors: Bibhas Manna, Arnob Saha, Zhouhang Jiang, Kai Ni, Abhronil Sengupta

Abstract: Reliability issues stemming from device level non-idealities of non-volatile emerging technologies like ferroelectric field-effect transistors (FeFET), especially at scaled dimensions, cause substantial degradation in the accuracy of In-Memory crossbar-based AI systems. In this work, we present a variation-aware design technique to characterize the device level variations and to mitigate their imp… ▽ More Reliability issues stemming from device level non-idealities of non-volatile emerging technologies like ferroelectric field-effect transistors (FeFET), especially at scaled dimensions, cause substantial degradation in the accuracy of In-Memory crossbar-based AI systems. In this work, we present a variation-aware design technique to characterize the device level variations and to mitigate their impact on hardware accuracy employing a Bayesian Neural Network (BNN) approach. An effective conductance variation model is derived from the experimental measurements of cycle-to-cycle (C2C) and device-to-device (D2D) variations performed on FeFET devices fabricated using 28 nm high-$k$ metal gate technology. The variations were found to be a function of different conductance states within the given programming range, which sharply contrasts earlier efforts where a fixed variation dispersion was considered for all conductance values. Such variation characteristics formulated for three different device sizes at different read voltages were provided as prior variation information to the BNN to yield a more exact and reliable inference. Near-ideal accuracy for shallow networks (MLP5 and LeNet models) on the MNIST dataset and limited accuracy decline by $\sim$3.8-16.1% for deeper AlexNet models on CIFAR10 dataset under a wide range of variations corresponding to different device sizes and read voltages, demonstrates the efficacy of our proposed device-algorithm co-design technique. △ Less

Submitted 13 March, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

arXiv:2310.04940 [pdf, other]

SEE-MCAM: Scalable Multi-bit FeFET Content Addressable Memories for Energy Efficient Associative Search

Authors: Shengxi Shou, Che-Kai Liu, Sanggeon Yun, Zishen Wan, Kai Ni, Mohsen Imani, X. Sharon Hu, Jianyi Yang, Cheng Zhuo, Xunzhao Yin

Abstract: In this work, we propose SEE-MCAM, scalable and compact multi-bit CAM (MCAM) designs that utilize the three-terminal ferroelectric FET (FeFET) as the proxy. By exploiting the multi-level-cell characteristics of FeFETs, our proposed SEE-MCAM designs enable multi-bit associative search functions and achieve better energy efficiency and performance than existing FeFET-based CAM designs. We validated… ▽ More In this work, we propose SEE-MCAM, scalable and compact multi-bit CAM (MCAM) designs that utilize the three-terminal ferroelectric FET (FeFET) as the proxy. By exploiting the multi-level-cell characteristics of FeFETs, our proposed SEE-MCAM designs enable multi-bit associative search functions and achieve better energy efficiency and performance than existing FeFET-based CAM designs. We validated the functionality of our proposed designs by achieving 3 bits per cell CAM functionality, resulting in 3x improvement in storage density. The area per bit of the proposed SEE-MCAM cell is 8% of the conventional CMOS CAM. We thoroughly investigated the scalability and robustness of the proposed design. Evaluation results suggest that the proposed 2FeFET-1T SEE-MCAM achieves 9.8x more energy efficiency and 1.6x less search latency compared to the CMOS CAM, respectively. When compared to existing MCAM designs, the proposed SEE-MCAM can achieve 8.7x and 4.9x more energy efficiency than ReRAM-based and FeFET-based MCAMs, respectively. Benchmarking results show that our approach provides up to 3 orders of magnitude improvement in speedup and energy efficiency over a GPU implementation in accelerating a novel quantized hyperdimensional computing (HDC) application. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: Accepted by Internation Conference on Computer-Aided Design (ICCAD), 2023

arXiv:2309.13853 [pdf, other]

A Ferroelectric Compute-in-Memory Annealer for Combinatorial Optimization Problems

Authors: Xunzhao Yin, Yu Qian, Alptekin Vardar, Marcel Gunther, Franz Muller, Nellie Laleni, Zijian Zhao, Zhouhang Jiang, Zhiguo Shi, Yiyu Shi, Xiao Gong, Cheng Zhuo, Thomas Kampfe, Kai Ni

Abstract: Computationally hard combinatorial optimization problems (COPs) are ubiquitous in many applications, including logistical planning, resource allocation, chip design, drug explorations, and more. Due to their critical significance and the inability of conventional hardware in efficiently handling scaled COPs, there is a growing interest in develo** computing hardware tailored specifically for COP… ▽ More Computationally hard combinatorial optimization problems (COPs) are ubiquitous in many applications, including logistical planning, resource allocation, chip design, drug explorations, and more. Due to their critical significance and the inability of conventional hardware in efficiently handling scaled COPs, there is a growing interest in develo** computing hardware tailored specifically for COPs, including digital annealers, dynamical Ising machines, and quantum/photonic systems. However, significant hurdles still remain, such as the memory access issue, the system scalability and restricted applicability to certain types of COPs, and VLSI-incompatibility, respectively. Here, a ferroelectric field effect transistor (FeFET) based compute-in-memory (CiM) annealer is proposed. After converting COPs into quadratic unconstrained binary optimization (QUBO) formulations, a hardware-algorithm co-design is conducted, yielding an energy-efficient, versatile, and scalable hardware for COPs. To accelerate the core vector-matrix-vector (VMV) multiplication of QUBO formulations, a FeFET based CiM array is exploited, which can accelerate the intended operation in-situ due to its unique three-terminal structure. In particular, a lossless compression technique is proposed to prune typically sparse QUBO matrix to reduce hardware cost. Furthermore, a multi-epoch simulated annealing (MESA) algorithm is proposed to replace conventional simulated annealing for its faster convergence and better solution quality. The effectiveness of the proposed techniques is validated through the utilization of developed chip prototypes for successfully solving graph coloring problem, indicating great promise of FeFET CiM annealer in solving general COPs. △ Less

Submitted 24 September, 2023; originally announced September 2023.

Comments: 39 pages, 12 figures

arXiv:2306.01863 [pdf, other]

Embedding Security into Ferroelectric FET Array via In-Situ Memory Operation

Authors: Yixin Xu, Yi Xiao, Zijian Zhao, Franz Müller, Alptekin Vardar, Xiao Gong, Sumitha George, Thomas Kämpfe, Vijaykrishnan Narayanan, Kai Ni

Abstract: Non-volatile memories (NVMs) have the potential to reshape next-generation memory systems because of their promising properties of near-zero leakage power consumption, high density and non-volatility. However, NVMs also face critical security threats that exploit the non-volatile property. Compared to volatile memory, the capability of retaining data even after power down makes NVM more vulnerable… ▽ More Non-volatile memories (NVMs) have the potential to reshape next-generation memory systems because of their promising properties of near-zero leakage power consumption, high density and non-volatility. However, NVMs also face critical security threats that exploit the non-volatile property. Compared to volatile memory, the capability of retaining data even after power down makes NVM more vulnerable. Existing solutions to address the security issues of NVMs are mainly based on Advanced Encryption Standard (AES), which incurs significant performance and power overhead. In this paper, we propose a lightweight memory encryption/decryption scheme by exploiting in-situ memory operations with negligible overhead. To validate the feasibility of the encryption/decryption scheme, device-level and array-level experiments are performed using ferroelectric field effect transistor (FeFET) as an example NVM without loss of generality. Besides, a comprehensive evaluation is performed on a 128x128 FeFET AND-type memory array in terms of area, latency, power and throughput. Compared with the AES-based scheme, our scheme shows around 22.6x/14.1x increase in encryption/decryption throughput with negligible power penalty. Furthermore, we evaluate the performance of our scheme over the AES-based scheme when deploying different neural network workloads. Our scheme yields significant latency reduction by 90% on average for encryption and decryption processes. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2305.01484 [pdf, other]

Powering Disturb-Free Reconfigurable Computing and Tunable Analog Electronics with Dual-Port Ferroelectric FET

Authors: Zijian Zhao, Shan Deng, Swetaki Chatterjee, Zhouhang Jiang, Muhammad Shaffatul Islam, Yi Xiao, Yixin Xu, Scott Meninger, Mohamed Mohamed, Rajiv Joshi, Yogesh Singh Chauhan, Halid Mulaosmanovic, Stefan Duenkel, Dominik Kleimaier, Sven Beyer, Hussam Amrouch, Vijaykrishnan Narayanan, Kai Ni

Abstract: Single-port ferroelectric FET (FeFET) that performs write and read operations on the same electrical gate prevents its wide application in tunable analog electronics and suffers from read disturb, especially to the high-threshold voltage (VTH) state as the retention energy barrier is reduced by the applied read bias. To address both issues, we propose to adopt a read disturb-free dual-port FeFET w… ▽ More Single-port ferroelectric FET (FeFET) that performs write and read operations on the same electrical gate prevents its wide application in tunable analog electronics and suffers from read disturb, especially to the high-threshold voltage (VTH) state as the retention energy barrier is reduced by the applied read bias. To address both issues, we propose to adopt a read disturb-free dual-port FeFET where write is performed on the gate featuring a ferroelectric layer and the read is done on a separate gate featuring a non-ferroelectric dielectric. Combining the unique structure and the separate read gate, read disturb is eliminated as the applied field is aligned with polarization in the high-VTH state and thus improving its stability, while it is screened by the channel inversion charge and exerts no negative impact on the low-VTH state stability. Comprehensive theoretical and experimental validation have been performed on fully-depleted silicon-on-insulator (FDSOI) FeFETs integrated on 22 nm platform, which intrinsically has dual ports with its buried oxide layer acting as the non-ferroelectric dielectric. Novel applications that can exploit the proposed dual-port FeFET are proposed and experimentally demonstrated for the first time, including FPGA that harnesses its read disturb-free feature and tunable analog electronics (e.g., frequency tunable ring oscillator in this work) leveraging the separated write and read paths. △ Less

Submitted 2 May, 2023; originally announced May 2023.

Comments: 32 pages

arXiv:2305.00149 [pdf, other]

X-ray Recognition: Patient identification from X-rays using a contrastive objective

Authors: Hao Liang, Kevin Ni, Guha Balakrishnan

Abstract: Recent research demonstrates that deep learning models are capable of precisely extracting bio-information (e.g. race, gender and age) from patients' Chest X-Rays (CXRs). In this paper, we further show that deep learning models are also surprisingly accurate at recognition, i.e., distinguishing CXRs belonging to the same patient from those belonging to different patients. These findings suggest po… ▽ More Recent research demonstrates that deep learning models are capable of precisely extracting bio-information (e.g. race, gender and age) from patients' Chest X-Rays (CXRs). In this paper, we further show that deep learning models are also surprisingly accurate at recognition, i.e., distinguishing CXRs belonging to the same patient from those belonging to different patients. These findings suggest potential privacy considerations that the medical imaging community should consider with the proliferation of large public CXR databases. △ Less

Submitted 28 April, 2023; originally announced May 2023.

arXiv:2305.00147 [pdf, other]

Visualizing chest X-ray dataset biases using GANs

Authors: Hao Liang, Kevin Ni, Guha Balakrishnan

Abstract: Recent work demonstrates that images from various chest X-ray datasets contain visual features that are strongly correlated with protected demographic attributes like race and gender. This finding raises issues of fairness, since some of these factors may be used by downstream algorithms for clinical predictions. In this work, we propose a framework, using generative adversarial networks (GANs), t… ▽ More Recent work demonstrates that images from various chest X-ray datasets contain visual features that are strongly correlated with protected demographic attributes like race and gender. This finding raises issues of fairness, since some of these factors may be used by downstream algorithms for clinical predictions. In this work, we propose a framework, using generative adversarial networks (GANs), to visualize what features are most different between X-rays belonging to two demographic subgroups. △ Less

Submitted 5 September, 2023; v1 submitted 28 April, 2023; originally announced May 2023.

Comments: Medical Imaging with Deep Learning(MIDL) 2023

arXiv:2212.08202 [pdf]

Voltage-controlled Cryogenic Boolean Logic Family Based on Ferroelectric SQUID

Authors: Shamiul Alam, Md Shafayat Hossain, Kai Ni, Vijaykrishnan Narayanan, Ahmedullah Aziz

Abstract: The recent progress in quantum computing and space exploration led to a surge in interest in cryogenic electronics. Superconducting devices such as Josephson junction, Josephson field effect transistor, cryotron, and superconducting quantum interference device (SQUID) are traditionally used to build cryogenic logic gates. However, due to the superconducting nature, gate-voltage-based control of th… ▽ More The recent progress in quantum computing and space exploration led to a surge in interest in cryogenic electronics. Superconducting devices such as Josephson junction, Josephson field effect transistor, cryotron, and superconducting quantum interference device (SQUID) are traditionally used to build cryogenic logic gates. However, due to the superconducting nature, gate-voltage-based control of these devices is extremely difficult. Even more challenging is to cascade the logic gates because most of these devices require current bias for their operation. Therefore, these devices are not as convenient as the semiconducting transistors to design logic gates. Here, to overcome these challenges, we propose a ferroelectric SQUID (FeSQUID) based voltage-controlled logic gates. FeSQUID exhibits two different critical current levels for two different voltage-switchable polarization states of the ferroelectric. We utilize the polarization-dependent (hence, voltage-controllable) superconducting to resistive switching of FeSQUID to design Boolean logic gates such as Copy, NOT, AND, and OR gates. The operations of these gates are verified using a Verilog-A-based compact model of FeSQUID. Finally, to demonstrate the fanning out capability of FeSQUID-based logic family, we simulate a 2-input XOR gate using FeSQUID-based NOT, AND, and OR gates. Together with the ongoing progress on FeSQUID-based non-volatile memory, our designed FeSQUID-based logic family will enable all-FeSQUID based cryogenic computer, ensure minimum mismatch between logic and memory blocks in terms of speed, power consumption, and fabrication process. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: 11 pages, 4 figures

arXiv:2212.04973 [pdf]

Eliminating Leakage in Volatile Memory with Anti-Ferroelectric Transistors

Authors: Hongtao Zhong, Zijie Zheng, Leming Jiao, Zuopu Zhou, Chen Sun, Xiaoyang Ma, Vijaykrishnan Narayanan, Huazhong Yang, Kai Ni, Xiao Gong, Xueqing Li

Abstract: Cache serves as a temporary data memory module in many general-purpose processors and domain-specific accelerators. Its density, power, speed, and reliability play a critical role in enhancing the overall system performance and quality of service. Conventional volatile memories, including static random-access memory (SRAM) and embedded dynamic random-access memory (eDRAM) in the complementary meta… ▽ More Cache serves as a temporary data memory module in many general-purpose processors and domain-specific accelerators. Its density, power, speed, and reliability play a critical role in enhancing the overall system performance and quality of service. Conventional volatile memories, including static random-access memory (SRAM) and embedded dynamic random-access memory (eDRAM) in the complementary metal-oxide-semiconductor technology, have high performance and good reliability. However, the inherent leakage in both SRAM and eDRAM hinders further improvement towards smaller feature sizes and higher energy efficiency. Although the emerging nonvolatile memories can eliminate the leakage efficiently, the penalties of lower speed and degraded reliability are significant. This article reveals a new opportunity towards leakage-free volatile static memory beyond the known paradigms of existing volatile and nonvolatile memories. By engineering a double-well energy landscape with the assistance of a clam** voltage bias, leakage-free and refresh-free state retention of volatile memory is achieved for the first time. This new memory is highlighted by both the ultra-low leakage of nonvolatile memories and the speed, energy, and reliability advantages of volatile memories. A proof-of-concept memory is demonstrated using in-house anti-ferroelectric field-effect transistors (AFeFETs), delivering an extrapolated endurance of about 1012 cycles, a retention time of over 10 years, and no subthreshold channel leakage current. Such a new concept of AFeFET-based memory enables an improved balance between density, power, and reliability beyond all existing memory solutions. △ Less

Submitted 1 February, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

arXiv:2212.00089 [pdf, other]

Ferroelectric FET based Context-Switching FPGA Enabling Dynamic Reconfiguration for Adaptive Deep Learning Machines

Authors: Yixin Xu, Zijian Zhao, Yi Xiao, Tongguang Yu, Halid Mulaosmanovic, Dominik Kleimaier, Stefan Duenkel, Sven Beyer, Xiao Gong, Rajiv Joshi, X. Sharon Hu, Shixian Wen, Amanda Sofie Rios, Kiran Lekkala, Laurent Itti, Eric Homan, Sumitha George, Vijaykrishnan Narayanan, Kai Ni

Abstract: Field Programmable Gate Array (FPGA) is widely used in acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the tradeoff between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. In this paper, we perfor… ▽ More Field Programmable Gate Array (FPGA) is widely used in acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the tradeoff between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. In this paper, we perform technology-circuit-architecture co-design to break this tradeoff with no additional area cost and lower power consumption compared with conventional designs while providing dynamic reconfiguration, which can hide the reconfiguration time behind the execution time. Leveraging the intrinsic transistor structure and non-volatility of ferroelectric FET (FeFET), compact FPGA primitives are proposed and experimentally verified, including 1FeFET look-up table (LUT) cell, 1FeFET routing cell for connection blocks (CBs) and switch boxes (SBs). To support dynamic reconfiguration, two local copies of primitives are placed in parallel, which enables loading of arbitrary configuration without interrupting the active configuration execution. A comprehensive evaluation shows that compared with the SRAM-based FPGA, our dynamic reconfiguration design shows 63.0%/71.1% reduction in LUT/CB area and 82.7%/53.6% reduction in CB/SB power consumption with minimal penalty in the critical path delay (9.6%). We further implement a Super-Sub network model to show the benefit from the context-switching capability of our design. We also evaluate the timing performance of our design over conventional FPGA in various application scenarios. In one scenario that users switch between two preloaded configurations, our design yields significant time saving by 78.7% on average. In the other scenario of implementing multiple configurations with dynamic reconfiguration, our design offers time saving of 20.3% on average. △ Less

Submitted 30 November, 2022; originally announced December 2022.

Comments: 54 pages, 15 figures

arXiv:2211.01556 [pdf, other]

Ground Plane Matters: Picking Up Ground Plane Prior in Monocular 3D Object Detection

Authors: Fan Yang, Xinhao Xu, Hui Chen, Yuchen Guo, Jungong Han, Kai Ni, Guiguang Ding

Abstract: The ground plane prior is a very informative geometry clue in monocular 3D object detection (M3OD). However, it has been neglected by most mainstream methods. In this paper, we identify two key factors that limit the applicability of ground plane prior: the projection point localization issue and the ground plane tilt issue. To pick up the ground plane prior for M3OD, we propose a Ground Plane Enh… ▽ More The ground plane prior is a very informative geometry clue in monocular 3D object detection (M3OD). However, it has been neglected by most mainstream methods. In this paper, we identify two key factors that limit the applicability of ground plane prior: the projection point localization issue and the ground plane tilt issue. To pick up the ground plane prior for M3OD, we propose a Ground Plane Enhanced Network (GPENet) which resolves both issues at one go. For the projection point localization issue, instead of using the bottom vertices or bottom center of the 3D bounding box (BBox), we leverage the object's ground contact points, which are explicit pixels in the image and easy for the neural network to detect. For the ground plane tilt problem, our GPENet estimates the horizon line in the image and derives a novel mathematical expression to accurately estimate the ground plane equation. An unsupervised vertical edge mining algorithm is also proposed to address the occlusion of the horizon line. Furthermore, we design a novel 3D bounding box deduction method based on a dynamic back projection algorithm, which could take advantage of the accurate contact points and the ground plane equation. Additionally, using only M3OD labels, contact point and horizon line pseudo labels can be easily generated with NO extra data collection and label annotation cost. Extensive experiments on the popular KITTI benchmark show that our GPENet can outperform other methods and achieve state-of-the-art performance, well demonstrating the effectiveness and the superiority of the proposed approach. Moreover, our GPENet works better than other methods in cross-dataset evaluation on the nuScenes dataset. Our code and models will be published. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: 13 pages, 10 figures

arXiv:2209.13685 [pdf, other]

Hybrid Stochastic Synapses Enabled by Scaled Ferroelectric Field-effect Transistors

Authors: A N M Nafiul Islam, Arnob Saha, Zhouhang Jiang, Kai Ni, Abhronil Sengupta

Abstract: Achieving brain-like density and performance in neuromorphic computers necessitates scaling down the size of nanodevices emulating neuro-synaptic functionalities. However, scaling nanodevices results in reduction of programming resolution and emergence of stochastic non-idealities. While prior work has mainly focused on binary transitions, in this work we leverage the stochastic switching of a thr… ▽ More Achieving brain-like density and performance in neuromorphic computers necessitates scaling down the size of nanodevices emulating neuro-synaptic functionalities. However, scaling nanodevices results in reduction of programming resolution and emergence of stochastic non-idealities. While prior work has mainly focused on binary transitions, in this work we leverage the stochastic switching of a three-state ferroelectric field effect transistor (FeFET) to implement a long-term and short-term 2-tier stochastic synaptic memory with a single device. Experimental measurements are performed on a scaled 28nm high-$k$ metal gate technology-based device to develop a probabilistic model of the hybrid stochastic synapse. In addition to the advantage of ultra-low programming energies afforded by scaling, our hardware-algorithm co-design analysis reveals the efficacy of the 2-tier memory in comparison to binary stochastic synapses in on-chip learning tasks -- paving the way for algorithms exploiting multi-state devices with probabilistic transitions beyond deterministic ones. △ Less

Submitted 10 March, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

arXiv:2209.11971 [pdf, other]

A Homogeneous Processing Fabric for Matrix-Vector Multiplication and Associative Search Using Ferroelectric Time-Domain Compute-in-Memory

Authors: Xunzhao Yin, Qingrong Huang, Franz Müller, Shan Deng, Alptekin Vardar, Sourav De, Zhouhang Jiang, Mohsen Imani, Cheng Zhuo, Thomas Kämpfe, Kai Ni

Abstract: In this work, we propose a ferroelectric FET(FeFET) time-domain compute-in-memory (TD-CiM) array as a homogeneous processing fabric for binary multiplication-accumulation (MAC) and content addressable memory (CAM). We demonstrate that: i) the XOR(XNOR)/AND logic function can be realized using a single cell composed of 2FeFETs connected in series; ii) a two-phase computation in an inverter chain wi… ▽ More In this work, we propose a ferroelectric FET(FeFET) time-domain compute-in-memory (TD-CiM) array as a homogeneous processing fabric for binary multiplication-accumulation (MAC) and content addressable memory (CAM). We demonstrate that: i) the XOR(XNOR)/AND logic function can be realized using a single cell composed of 2FeFETs connected in series; ii) a two-phase computation in an inverter chain with each stage featuring the XOR/AND cell to control the associated capacitor loading and the computation results of binary MAC and CAM are reflected in the chain output signal delay, illustrating full digital compatibility; iii) comprehensive theoretical and experimental validation of the proposed 2FeFET cell and inverter delay chains and their robustness against FeFET variation; iv) the homogeneous processing fabric is applied in hyperdimensional computing to show dynamic and fine-grain resource allocation to accommodate different tasks requiring varying demands over the binary MAC and CAM resources. △ Less

Submitted 24 September, 2022; originally announced September 2022.

Comments: 8 pages, 8 figures

arXiv:2208.14678 [pdf]

Ferroelectric FET-based strong physical unclonable function: a low-power, high-reliable and reconfigurable solution for Internet-of-Things security

Authors: Xinrui Guo, Xiaoyang Ma, Franz Muller, Kai Ni, Thomas Kampfe, Yongpan Liu, Vijaykrishnan Narayanan, Xueqing Li

Abstract: Hardware security has been a key concern in modern information technologies. Especially, as the number of Internet-of-Things (IoT) devices grows rapidly, to protect the device security with low-cost security primitives becomes essential, among which Physical Unclonable Function (PUF) is a widely-used solution. In this paper, we propose the first FeFET-based strong PUF exploiting the cycle-to-cycle… ▽ More Hardware security has been a key concern in modern information technologies. Especially, as the number of Internet-of-Things (IoT) devices grows rapidly, to protect the device security with low-cost security primitives becomes essential, among which Physical Unclonable Function (PUF) is a widely-used solution. In this paper, we propose the first FeFET-based strong PUF exploiting the cycle-to-cycle (C2C) variation of FeFETs as the entropy source. Based on the experimental measurements, the proposed PUF shows satisfying performance including high uniformity, uniqueness, reconfigurability and reliability. To resist machine-learning attack, XOR structure was introduced, and simulations show that our proposed PUF has similar resistance to existing attack models with traditional arbiter PUFs. Furthermore, our design is shown to be power-efficient, and highly robust to write voltage, temperature and device size, which makes it a competitive security solution for Internet-of-Things edge devices. △ Less

Submitted 31 August, 2022; originally announced August 2022.

arXiv:2208.08611 [pdf]

Intellectual Property Evaluation Utilizing Machine Learning

Authors: **xin Ding, Yuxin Huang, Keyang Ni, Xueyao Wang, Yinxiao Wang, Yucheng Wang

Abstract: Intellectual properties is increasingly important in the economic development. To solve the pain points by traditional methods in IP evaluation, we are develo** a new technology with machine learning as the core. We have built an online platform and will expand our business in the Greater Bay Area with plans. Intellectual properties is increasingly important in the economic development. To solve the pain points by traditional methods in IP evaluation, we are develo** a new technology with machine learning as the core. We have built an online platform and will expand our business in the Greater Bay Area with plans. △ Less

Submitted 17 August, 2022; originally announced August 2022.

Comments: 5 pages, 2 figures

arXiv:2208.02613 [pdf, other]

Semantic Interleaving Global Channel Attention for Multilabel Remote Sensing Image Classification

Authors: Yongkun Liu, Kesong Ni, Yuhan Zhang, Lijian Zhou, Kun Zhao

Abstract: Multi-Label Remote Sensing Image Classification (MLRSIC) has received increasing research interest. Taking the cooccurrence relationship of multiple labels as additional information helps to improve the performance of this task. Current methods focus on using it to constrain the final feature output of a Convolutional Neural Network (CNN). On the one hand, these methods do not make full use of lab… ▽ More Multi-Label Remote Sensing Image Classification (MLRSIC) has received increasing research interest. Taking the cooccurrence relationship of multiple labels as additional information helps to improve the performance of this task. Current methods focus on using it to constrain the final feature output of a Convolutional Neural Network (CNN). On the one hand, these methods do not make full use of label correlation to form feature representation. On the other hand, they increase the label noise sensitivity of the system, resulting in poor robustness. In this paper, a novel method called Semantic Interleaving Global Channel Attention (SIGNA) is proposed for MLRSIC. First, the label co-occurrence graph is obtained according to the statistical information of the data set. The label co-occurrence graph is used as the input of the Graph Neural Network (GNN) to generate optimal feature representations. Then, the semantic features and visual features are interleaved, to guide the feature expression of the image from the original feature space to the semantic feature space with embedded label relations. SIGNA triggers global attention of feature maps channels in a new semantic feature space to extract more important visual features. Multihead SIGNA based feature adaptive weighting networks are proposed to act on any layer of CNN in a plug-and-play manner. For remote sensing images, better classification performance can be achieved by inserting CNN into the shallow layer. We conduct extensive experimental comparisons on three data sets: UCM data set, AID data set, and DFC15 data set. Experimental results demonstrate that the proposed SIGNA achieves superior classification performance compared to state-of-the-art (SOTA) methods. It is worth mentioning that the codes of this paper will be open to the community for reproducibility research. Our codes are available at https://github.com/kyle-one/SIGNA. △ Less

Submitted 2 January, 2023; v1 submitted 4 August, 2022; originally announced August 2022.

Comments: 14 pages, 13 figures

arXiv:2207.12188 [pdf, other]

COSIME: FeFET based Associative Memory for In-Memory Cosine Similarity Search

Authors: Che-Kai Liu, Haobang Chen, Mohsen Imani, Kai Ni, Arman Kazemi, Ann Franchesca Laguna, Michael Niemier, Xiaobo Sharon Hu, Liang Zhao, Cheng Zhuo, Xunzhao Yin

Abstract: In a number of machine learning models, an input query is searched across the trained class vectors to find the closest feature class vector in cosine similarity metric. However, performing the cosine similarities between the vectors in Von-Neumann machines involves a large number of multiplications, Euclidean normalizations and division operations, thus incurring heavy hardware energy and latency… ▽ More In a number of machine learning models, an input query is searched across the trained class vectors to find the closest feature class vector in cosine similarity metric. However, performing the cosine similarities between the vectors in Von-Neumann machines involves a large number of multiplications, Euclidean normalizations and division operations, thus incurring heavy hardware energy and latency overheads. Moreover, due to the memory wall problem that presents in the conventional architecture, frequent cosine similarity-based searches (CSSs) over the class vectors requires a lot of data movements, limiting the throughput and efficiency of the system. To overcome the aforementioned challenges, this paper introduces COSIME, an general in-memory associative memory (AM) engine based on the ferroelectric FET (FeFET) device for efficient CSS. By leveraging the one-transistor AND gate function of FeFET devices, current-based translinear analog circuit and winner-take-all (WTA) circuitry, COSIME can realize parallel in-memory CSS across all the entries in a memory block, and output the closest word to the input query in cosine similarity metric. Evaluation results at the array level suggest that the proposed COSIME design achieves 333X and 90.5X latency and energy improvements, respectively, and realizes better classification accuracy when compared with an AM design implementing approximated CSS. The proposed in-memory computing fabric is evaluated for an HDC problem, showcasing that COSIME can achieve on average 47.1X and 98.5X speedup and energy efficiency improvements compared with an GPU implementation. △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: Accepted by the 41st International Conference on Computer Aided Design (ICCAD), San Diego, USA

arXiv:2205.14729 [pdf]

CMOS-Compatible Ising Machines built using Bistable Latches Coupled through Ferroelectric Transistor Arrays

Authors: Antik Mallick, Zijian Zhao, Mohammad Khairul Bashar, Shamiul Alam, Md Mazharul Islam, Yi Xiao, Yixin Xu, Ahmedullah Aziz, Vijaykrishnan Narayanan, Kai Ni, Nikhil Shukla

Abstract: Realizing compact and scalable Ising machines that are compatible with CMOS-process technology is crucial to the effectiveness and practicality of using such hardware platforms for accelerating computationally intractable problems. Besides the need for realizing compact Ising spins, the implementation of the coupling network, which describes the spin interaction, is also a potential bottleneck in… ▽ More Realizing compact and scalable Ising machines that are compatible with CMOS-process technology is crucial to the effectiveness and practicality of using such hardware platforms for accelerating computationally intractable problems. Besides the need for realizing compact Ising spins, the implementation of the coupling network, which describes the spin interaction, is also a potential bottleneck in the scalability of such platforms. Therefore, in this work, we propose an Ising machine platform that exploits the novel behavior of compact bi-stable CMOS-latches (cross-coupled inverters) as classical Ising spins interacting through highly scalable and CMOS-process compatible ferroelectric-HfO2-based Ferroelectric FETs (FeFETs) which act as coupling elements. We experimentally demonstrate the prototype building blocks of this system, and evaluate the behavior of the scaled system using simulations. We project that the proposed architecture can compute Ising solutions with an efficiency of ~1.04 x 10^8 solutions/W/second. Our work not only provides a pathway to realizing CMOS-compatible designs but also to overcoming their scaling challenges. △ Less

Submitted 29 May, 2022; originally announced May 2022.

Comments: 29 pages, 10 figures

arXiv:2203.07948 [pdf, other]

An Ultra-Compact Single FeFET Binary and Multi-Bit Associative Search Engine

Authors: Xunzhao Yin, Franz Müller, Qingrong Huang, Chao Li, Mohsen Imani, Zeyu Yang, Jiahao Cai, Maximilian Lederer, Ricardo Olivo, Nellie Laleni, Shan Deng, Zijian Zhao, Cheng Zhuo, Thomas Kämpfe, Kai Ni

Abstract: Content addressable memory (CAM) is widely used in associative search tasks for its highly parallel pattern matching capability. To accommodate the increasingly complex and data-intensive pattern matching tasks, it is critical to keep improving the CAM density to enhance the performance and area efficiency. In this work, we demonstrate: i) a novel ultra-compact 1FeFET CAM design that enables paral… ▽ More Content addressable memory (CAM) is widely used in associative search tasks for its highly parallel pattern matching capability. To accommodate the increasingly complex and data-intensive pattern matching tasks, it is critical to keep improving the CAM density to enhance the performance and area efficiency. In this work, we demonstrate: i) a novel ultra-compact 1FeFET CAM design that enables parallel associative search and in-memory hamming distance calculation; ii) a multi-bit CAM for exact search using the same CAM cell; iii) compact device designs that integrate the series resistor current limiter into the intrinsic FeFET structure to turn the 1FeFET1R into an effective 1FeFET cell; iv) a successful 2-step search operation and a sufficient sensing margin of the proposed binary and multi-bit 1FeFET1R CAM array with sizes of practical interests in both experiments and simulations, given the existing unoptimized FeFET device variation; v) 89.9x speedup and 66.5x energy efficiency improvement over the state-of-the art alignment tools on GPU in accelerating genome pattern matching applications through the hyperdimensional computing paradigm. △ Less

Submitted 15 March, 2022; originally announced March 2022.

Comments: 20 pages, 14 figures

arXiv:2110.03855 [pdf, other]

doi 10.1038/s41467-022-29795-3

Hardware Functional Obfuscation With Ferroelectric Active Interconnects

Authors: Tonggunag Yu, Yixin Xu, Shan Deng, Zijian Zhao, Nicolas Jao, You Sung Kim, Stefan Duenkel, Sven Beyer, Kai Ni, Sumitha George, Vijaykrishnan Narayanan

Abstract: Camouflaging gate techniques are typically used in hardware security to prevent reverse engineering. Layout level camouflaging by adding dummy contacts ensures some level of protection against extracting the correct netlist. Threshold voltage manipulation for multi-functional logic with identical layouts has also been introduced for functional obfuscation. All these techniques are implemented at t… ▽ More Camouflaging gate techniques are typically used in hardware security to prevent reverse engineering. Layout level camouflaging by adding dummy contacts ensures some level of protection against extracting the correct netlist. Threshold voltage manipulation for multi-functional logic with identical layouts has also been introduced for functional obfuscation. All these techniques are implemented at the expense of circuit-complexity and with significant area, energy, and delay penalty. In this paper, we propose an efficient hardware encryption technique with minimal complexity and overheads based on ferroelectric field-effect transistor (FeFET) active interconnects. The active interconnect provides run-time reconfigurable inverter-buffer logic by utilizing the threshold voltage programmability of the FeFETs. Our method utilizes only two FeFETs and an inverter to realize the masking function compared to recent reconfigurable logic gate implementations using several FeFETs and complex differential logic. We fabricate the proposed circuit and demonstrate the functionality. Judicious placement of the proposed logic in the IC makes it acts as a hardware encryption key and enables encoding and decoding of the functional output without affecting the critical path timing delay. Also, we achieve comparable encryption probability with a limited number of encryption units. In addition, we show a peripheral programming scheme for reconfigurable logic by reusing the existing scan chain logic, hence obviating the need for specialized programming logic and circuitry for keybit distribution. Our analysis shows an average encryption probability of 97.43% with an increase of 2.24%/ 3.67% delay for the most critical path/ sum of 100 critical paths delay for ISCAS85 benchmarks. △ Less

Submitted 25 April, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Journal ref: Nat Commun 13, 2235 (2022)

arXiv:2110.02495 [pdf, other]

Deep Random Forest with Ferroelectric Analog Content Addressable Memory

Authors: Xunzhao Yin, Franz Müller, Ann Franchesca Laguna, Chao Li, Wenwen Ye, Qingrong Huang, Qinming Zhang, Zhiguo Shi, Maximilian Lederer, Nellie Laleni, Shan Deng, Zijian Zhao, Michael Niemier, Xiaobo Sharon Hu, Cheng Zhuo, Thomas Kämpfe, Kai Ni

Abstract: Deep random forest (DRF), which incorporates the core features of deep learning and random forest (RF), exhibits comparable classification accuracy, interpretability, and low memory and computational overhead when compared with deep neural networks (DNNs) in various information processing tasks for edge intelligence. However, the development of efficient hardware to accelerate DRF is lagging behin… ▽ More Deep random forest (DRF), which incorporates the core features of deep learning and random forest (RF), exhibits comparable classification accuracy, interpretability, and low memory and computational overhead when compared with deep neural networks (DNNs) in various information processing tasks for edge intelligence. However, the development of efficient hardware to accelerate DRF is lagging behind its DNN counterparts. The key for hardware acceleration of DRF lies in efficiently realizing the branch-split operation at decision nodes when traversing a decision tree. In this work, we propose to implement DRF through simple associative searches realized with ferroelectric analog content addressable memory (ACAM). Utilizing only two ferroelectric field effect transistors (FeFETs), the ultra-compact ACAM cell can perform a branch-split operation with an energy-efficient associative search by storing the decision boundaries as the analog polarization states in an FeFET. The DRF accelerator architecture and the corresponding map** of the DRF model to the ACAM arrays are presented. The functionality, characteristics, and scalability of the FeFET ACAM based DRF and its robustness against FeFET device non-idealities are validated both in experiments and simulations. Evaluation results show that the FeFET ACAM DRF accelerator exhibits 10^6x/16x and 10^6x/2.5x improvements in terms of energy and latency when compared with other deep random forest hardware implementations on the state-of-the-art CPU/ReRAM, respectively. △ Less

Submitted 6 October, 2021; originally announced October 2021.

Comments: 44 pages, 16 figures

arXiv:2107.13088 [pdf, other]

doi 10.1063/5.0064860

Intrinsic synaptic plasticity of ferroelectric field effect transistors for online learning

Authors: Arnob Saha, A N M Nafiul Islam, Zijian Zhao, Shan Deng, Kai Ni, Abhronil Sengupta

Abstract: Nanoelectronic devices emulating neuro-synaptic functionalities through their intrinsic physics at low operating energies is imperative toward the realization of brain-like neuromorphic computers. In this work, we leverage the non-linear voltage dependent partial polarization switching of a ferroelectric field effect transistor to mimic plasticity characteristics of biological synapses. We provide… ▽ More Nanoelectronic devices emulating neuro-synaptic functionalities through their intrinsic physics at low operating energies is imperative toward the realization of brain-like neuromorphic computers. In this work, we leverage the non-linear voltage dependent partial polarization switching of a ferroelectric field effect transistor to mimic plasticity characteristics of biological synapses. We provide experimental measurements of the synaptic characteristics for a $28nm$ high-k metal gate technology based device and develop an experimentally calibrated device model for large-scale system performance prediction. Decoupled read-write paths, ultra-low programming energies and the possibility of arranging such devices in a cross-point architecture demonstrate the synaptic efficacy of the device. Our hardware-algorithm co-design analysis reveals that the intrinsic plasticity of the ferroelectric devices has potential to enable unsupervised local learning in edge devices with limited training data. △ Less

Submitted 16 September, 2021; v1 submitted 27 July, 2021; originally announced July 2021.

arXiv:2106.11757 [pdf, other]

Application-driven Design Exploration for Dense Ferroelectric Embedded Non-volatile Memories

Authors: Mohammad Mehdi Sharifi, Lillian Pentecost, Ramin Rajaei, Arman Kazemi, Qiuwen Lou, Gu-Yeon Wei, David Brooks, Kai Ni, X. Sharon Hu, Michael Niemier, Marco Donato

Abstract: The memory wall bottleneck is a key challenge across many data-intensive applications. Multi-level FeFET-based embedded non-volatile memories are a promising solution for denser and more energy-efficient on-chip memory. However, reliable multi-level cell storage requires careful optimizations to minimize the design overhead costs. In this work, we investigate the interplay between FeFET device cha… ▽ More The memory wall bottleneck is a key challenge across many data-intensive applications. Multi-level FeFET-based embedded non-volatile memories are a promising solution for denser and more energy-efficient on-chip memory. However, reliable multi-level cell storage requires careful optimizations to minimize the design overhead costs. In this work, we investigate the interplay between FeFET device characteristics, programming schemes, and memory array architecture, and explore different design choices to optimize performance, energy, area, and accuracy metrics for critical data-intensive workloads. From our cross-stack design exploration, we find that we can store DNN weights and social network graphs at a density of over 8MB/mm^2 and sub-2ns read access latency without loss in application accuracy. △ Less

Submitted 17 June, 2021; originally announced June 2021.

Comments: Accepted at ISLPED 2021

arXiv:2010.07261 [pdf, other]

Learning Improvised Chatbots from Adversarial Modifications of Natural Language Feedback

Authors: Makesh Narsimhan Sreedhar, Kun Ni, Siva Reddy

Abstract: The ubiquitous nature of chatbots and their interaction with users generate an enormous amount of data. Can we improve chatbots using this data? A self-feeding chatbot improves itself by asking natural language feedback when a user is dissatisfied with its response and uses this feedback as an additional training sample. However, user feedback in most cases contains extraneous sequences hindering… ▽ More The ubiquitous nature of chatbots and their interaction with users generate an enormous amount of data. Can we improve chatbots using this data? A self-feeding chatbot improves itself by asking natural language feedback when a user is dissatisfied with its response and uses this feedback as an additional training sample. However, user feedback in most cases contains extraneous sequences hindering their usefulness as a training sample. In this work, we propose a generative adversarial model that converts noisy feedback into a plausible natural response in a conversation. The generator's goal is to convert the feedback into a response that answers the user's previous utterance and to fool the discriminator which distinguishes feedback from natural responses. We show that augmenting original training data with these modified feedback responses improves the original chatbot performance from 69.94% to 75.96% in ranking correct responses on the Personachat dataset, a large improvement given that the original model is already trained on 131k samples. △ Less

Submitted 14 October, 2020; v1 submitted 14 October, 2020; originally announced October 2020.

Comments: Accepted for publication at Findings of EMNLP 2020

arXiv:2004.01866 [pdf]

doi 10.1109/TED.2020.2994896

FeCAM: A Universal Compact Digital and Analog Content Addressable Memory Using Ferroelectric

Authors: Xunzhao Yin, Chao Li, Qingrong Huang, Li Zhang, Michael Niemier, Xiaobo Sharon Hu, Cheng Zhuo, Kai Ni

Abstract: Ferroelectric field effect transistors (FeFETs) are being actively investigated with the potential for in-memory computing (IMC) over other non-volatile memories (NVMs). Content Addressable Memories (CAMs) are a form of IMC that performs parallel searches for matched entries over a memory array for a given input query. CAMs are widely used for data-centric applications that involve pattern matchin… ▽ More Ferroelectric field effect transistors (FeFETs) are being actively investigated with the potential for in-memory computing (IMC) over other non-volatile memories (NVMs). Content Addressable Memories (CAMs) are a form of IMC that performs parallel searches for matched entries over a memory array for a given input query. CAMs are widely used for data-centric applications that involve pattern matching and search functionality. To accommodate the ever expanding data, it is attractive to resort to analog CAM for memory density improvement. However, the digital CAM design nowadays based on standard CMOS or emerging nonvolatile memories (e.g., resistive storage devices) is already challenging due to area, power, and cost penalties. Thus, it can be extremely expensive to achieve analog CAM with those technologies due to added cell components. As such, we propose, for the first time, a universal compact FeFET based CAM design, FeCAM, with search and storage functionality enabled in digital and analog domain simultaneously. By exploiting the multi-level-cell (MLC) states of FeFET, FeCAM can store and search inputs in either digital or analog domain. We perform a device-circuit co-design of the proposed FeCAM and validate its functionality and performance using an experimentally calibrated FeFET model. Circuit level simulation results demonstrate that FeCAM can either store continuous matching ranges or encode 3-bit data in a single CAM cell. When compared with the existing digital CMOS based CAM approaches, FeCAM is found to improve both memory density by 22.4X and energy saving by 8.6/3.2X for analog/digital modes, respectively. In the CAM-related application, our evaluations show that FeCAM can achieve 60.5X/23.1X saving in area/search energy compared with conventional CMOS based CAMs. △ Less

Submitted 17 July, 2020; v1 submitted 4 April, 2020; originally announced April 2020.

Comments: 8 pages, 8 figures, accepted

Journal ref: IEEE Transactions on Electron Devices, 2020

arXiv:2004.00703 [pdf, other]

A Hybrid FeMFET-CMOS Analog Synapse Circuit for Neural Network Training and Inference

Authors: Arman Kazemi, Ramin Rajaei, Kai Ni, Suman Datta, Michael Niemier, X. Sharon Hu

Abstract: An analog synapse circuit based on ferroelectric-metal field-effect transistors is proposed, that offers 6-bit weight precision. The circuit is comprised of volatile least significant bits (LSBs) used solely during training, and non-volatile most significant bits (MSBs) used for both training and inference. The design works at a 1.8V logic-compatible voltage, provides 10^10 endurance cycles, and r… ▽ More An analog synapse circuit based on ferroelectric-metal field-effect transistors is proposed, that offers 6-bit weight precision. The circuit is comprised of volatile least significant bits (LSBs) used solely during training, and non-volatile most significant bits (MSBs) used for both training and inference. The design works at a 1.8V logic-compatible voltage, provides 10^10 endurance cycles, and requires only 250ps update pulses. A variant of LeNet trained with the proposed synapse achieves 98.2% accuracy on MNIST, which is only 0.4% lower than an ideal implementation of the same network with the same bit precision. Furthermore, the proposed synapse offers improvements of up to 26% in area, 44.8% in leakage power, 16.7% in LSB update pulse duration, and two orders of magnitude in endurance cycles, when compared to state-of-the-art hybrid synaptic circuits. Our proposed synapse can be extended to an 8-bit design, enabling a VGG-like network to achieve 88.8% accuracy on CIFAR-10 (only 0.8% lower than an ideal implementation of the same network). △ Less

Submitted 1 April, 2020; originally announced April 2020.

Comments: Accepted at ISCAS'20 for oral presentation

arXiv:1911.10232 [pdf, other]

doi 10.1109/BigData47090.2019.9005633

SWAG: Item Recommendations using Convolutions on Weighted Graphs

Authors: Amit Pande, Kai Ni, Venkataramani Kini

Abstract: Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. In this work, we present a Graph Convolutional Network (GCN) algorithm SWAG (Sample Weight and AGgregate), which combines efficient random walks and graph convolutions on weighted graphs to generate embeddings for nodes (items) that incorporate both graph… ▽ More Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. In this work, we present a Graph Convolutional Network (GCN) algorithm SWAG (Sample Weight and AGgregate), which combines efficient random walks and graph convolutions on weighted graphs to generate embeddings for nodes (items) that incorporate both graph structure as well as node feature information such as item-descriptions and item-images. The three important SWAG operations that enable us to efficiently generate node embeddings based on graph structures are (a) Sampling of graph to homogeneous structure, (b) Weighting the sampling, walks and convolution operations, and (c) using AGgregation functions for generating convolutions. The work is an adaptation of graphSAGE over weighted graphs. We deploy SWAG at Target and train it on a graph of more than 500K products sold online with over 50M edges. Offline and online evaluations reveal the benefit of using a graph-based approach and the benefits of weighing to produce high quality embeddings and product recommendations. △ Less

Submitted 22 November, 2019; originally announced November 2019.

Comments: 10 pages, 8 figures, 2019 IEEE BigData special session

Journal ref: 2019 IEEE International Conference on Big Data

arXiv:1806.10227 [pdf]

The Determinants For User Intention To Adopt Web Based Early Childhood Supplementary Educational Platform

Authors: Khoo Han Ni, Ahmad Suhaimi Baharudin, Kamal Karkonasasi

Abstract: Education is defined as the fundamental key for the success and development for any given society. Hence, the forming of a strong basic educational foundation is crucial to ensure the children to stay competitive and achieve extraordinary in the changing world. I-zLink Sdn. Bhd. is a partnership company that formed by four individuals with the commitment to develop an early childhood learning solu… ▽ More Education is defined as the fundamental key for the success and development for any given society. Hence, the forming of a strong basic educational foundation is crucial to ensure the children to stay competitive and achieve extraordinary in the changing world. I-zLink Sdn. Bhd. is a partnership company that formed by four individuals with the commitment to develop an early childhood learning solution named i-Future. The system, i-Future is a Supplementary Educational Platform that designs specially for children that fall between the age categories of two to six years which integrate with the latest technology to increase the engagement and flexibility in learning process as well as the effectiveness of study. In order to ensure that the system has a market value, an empirical study was conducted to determine the acceptance level and also the variables that would affect the user's intention to adopt i-Future. The variables used in the study include Perceived Ease of Use (PEU), Perceived Usefulness (PU), System Quality (SQ) and Social Norm (SN). Through the quantitative study, the relationship between these variables and user's intention to adopt i-Future was determined. Those variables which have significant impact on user's intention to adopt i-Future will be used to design i-Future. △ Less

Submitted 26 June, 2018; originally announced June 2018.

arXiv:1804.10669 [pdf, other]

Deep Speech Denoising with Vector Space Projections

Authors: Jeff Hetherly, Paul Gamble, Maria Barrios, Cory Stephenson, Karl Ni

Abstract: We propose an algorithm to denoise speakers from a single microphone in the presence of non-stationary and dynamic noise. Our approach is inspired by the recent success of neural network models separating speakers from other speakers and singers from instrumental accompaniment. Unlike prior art, we leverage embedding spaces produced with source-contrastive estimation, a technique derived from nega… ▽ More We propose an algorithm to denoise speakers from a single microphone in the presence of non-stationary and dynamic noise. Our approach is inspired by the recent success of neural network models separating speakers from other speakers and singers from instrumental accompaniment. Unlike prior art, we leverage embedding spaces produced with source-contrastive estimation, a technique derived from negative sampling techniques in natural language processing, while simultaneously obtaining a continuous inference mask. Our embedding space directly optimizes for the discrimination of speaker and noise by jointly modeling their characteristics. This space is generalizable in that it is not speaker or noise specific and is capable of denoising speech even if the model has not seen the speaker in the training set. Parameters are trained with dual objectives: one that promotes a selective bandpass filter that eliminates noise at time-frequency positions that exceed signal power, and another that proportionally splits time-frequency content between signal and noise. We compare to state of the art algorithms as well as traditional sparse non-negative matrix factorization solutions. The resulting algorithm avoids severe computational burden by providing a more intuitive and easily optimized approach, while achieving competitive accuracy. △ Less

Submitted 27 April, 2018; originally announced April 2018.

Comments: arXiv admin note: text overlap with arXiv:1705.04662

arXiv:1804.05053 [pdf, other]

Voices Obscured in Complex Environmental Settings (VOICES) corpus

Authors: Colleen Richey, Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeff Hetherly, Cory Stephenson, Karl Ni

Abstract: This paper introduces the Voices Obscured In Complex Environmental Settings (VOICES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions. Publicly available speech corpora are mostly composed of isolated speech at close-range microphony. A typical appro… ▽ More This paper introduces the Voices Obscured In Complex Environmental Settings (VOICES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions. Publicly available speech corpora are mostly composed of isolated speech at close-range microphony. A typical approach to better represent realistic scenarios, is to convolve clean speech with noise and simulated room response for model training. Despite these efforts, model performance degrades when tested against uncurated speech in natural conditions. For this corpus, audio was recorded in furnished rooms with background noise played in conjunction with foreground speech selected from the LibriSpeech corpus. Multiple sessions were recorded in each room to accommodate for all foreground speech-background noise combinations. Audio was recorded using twelve microphones placed throughout the room, resulting in 120 hours of audio per microphone. This work is a multi-organizational effort led by SRI International and Lab41 with the intent to push forward state-of-the-art distant microphone approaches in signal processing and speech recognition. △ Less

Submitted 15 May, 2018; v1 submitted 13 April, 2018; originally announced April 2018.

Comments: Submitted to Interspeech 2018

arXiv:1803.01891 [pdf, ps, other]

PySEAL: A Python wrapper implementation of the SEAL homomorphic encryption library

Authors: Alexander J. Titus, Shashwat Kishore, Todd Stavish, Stephanie M. Rogers, Karl Ni

Abstract: Motivation: The ability to perform operations on encrypted data has a growing number of applications in bioinformatics, with implications for data privacy in health care and biosecurity. The SEAL library is a popular implementation of fully homomorphic encryption developed in C++ by Microsoft Research. Despite the advantages of C++, Python is a flexible and dominant programming language that enabl… ▽ More Motivation: The ability to perform operations on encrypted data has a growing number of applications in bioinformatics, with implications for data privacy in health care and biosecurity. The SEAL library is a popular implementation of fully homomorphic encryption developed in C++ by Microsoft Research. Despite the advantages of C++, Python is a flexible and dominant programming language that enables rapid prototy** of bioinformatics pipelines. Results: In an effort to make homomorphic encryption accessible to a broader range of bioinformatics scientists and applications, we present a Python binding implementation of the popular homomorphic encryption library, SEAL, using pybind11. The software contains a Docker image to facilitate easy installation and execution of the SEAL build process. Availability: All code is publicly available at https://github.com/Lab41/PySEAL Contact: [email protected] Supplementary information: Supplementary information is available on the Lab41 GitHub. △ Less

Submitted 5 March, 2018; originally announced March 2018.

Comments: 2 pages, 1 figure

arXiv:1710.04288 [pdf, other]

Audio Concept Classification with Hierarchical Deep Neural Networks

Authors: Mirco Ravanelli, Benjamin Elizalde, Karl Ni, Gerald Friedland

Abstract: Audio-based multimedia retrieval tasks may identify semantic information in audio streams, i.e., audio concepts (such as music, laughter, or a revving engine). Conventional Gaussian-Mixture-Models have had some success in classifying a reduced set of audio concepts. However, multi-class classification can benefit from context window analysis and the discriminating power of deeper architectures. Al… ▽ More Audio-based multimedia retrieval tasks may identify semantic information in audio streams, i.e., audio concepts (such as music, laughter, or a revving engine). Conventional Gaussian-Mixture-Models have had some success in classifying a reduced set of audio concepts. However, multi-class classification can benefit from context window analysis and the discriminating power of deeper architectures. Although deep learning has shown promise in various applications such as speech and object recognition, it has not yet met the expectations for other fields such as audio concept classification. This paper explores, for the first time, the potential of deep learning in classifying audio concepts on User-Generated Content videos. The proposed system is comprised of two cascaded neural networks in a hierarchical configuration to analyze the short- and long-term context information. Our system outperforms a GMM approach by a relative 54%, a Neural Network by 33%, and a Deep Neural Network by 12% on the TRECVID-MED database △ Less

Submitted 11 October, 2017; originally announced October 2017.

Journal ref: EUSIPCO 2014

arXiv:1709.09254 [pdf, other]

Learning to Explain Non-Standard English Words and Phrases

Authors: Ke Ni, William Yang Wang

Abstract: We describe a data-driven approach for automatically explaining new, non-standard English expressions in a given sentence, building on a large dataset that includes 15 years of crowdsourced examples from UrbanDictionary.com. Unlike prior studies that focus on matching keywords from a slang dictionary, we investigate the possibility of learning a neural sequence-to-sequence model that generates exp… ▽ More We describe a data-driven approach for automatically explaining new, non-standard English expressions in a given sentence, building on a large dataset that includes 15 years of crowdsourced examples from UrbanDictionary.com. Unlike prior studies that focus on matching keywords from a slang dictionary, we investigate the possibility of learning a neural sequence-to-sequence model that generates explanations of unseen non-standard English expressions given context. We propose a dual encoder approach---a word-level encoder learns the representation of context, and a second character-level encoder to learn the hidden representation of the target non-standard expression. Our model can produce reasonable definitions of new non-standard English expressions given their context with certain confidence. △ Less

Submitted 26 September, 2017; originally announced September 2017.

Comments: IJCNLP 2017

arXiv:1705.04662 [pdf, other]

Monaural Audio Speaker Separation with Source Contrastive Estimation

Authors: Cory Stephenson, Patrick Callier, Abhinav Ganesh, Karl Ni

Abstract: We propose an algorithm to separate simultaneously speaking persons from each other, the "cocktail party problem", using a single microphone. Our approach involves a deep recurrent neural networks regression to a vector space that is descriptive of independent speakers. Such a vector space can embed empirically determined speaker characteristics and is optimized by distinguishing between speaker m… ▽ More We propose an algorithm to separate simultaneously speaking persons from each other, the "cocktail party problem", using a single microphone. Our approach involves a deep recurrent neural networks regression to a vector space that is descriptive of independent speakers. Such a vector space can embed empirically determined speaker characteristics and is optimized by distinguishing between speaker masks. We call this technique source-contrastive estimation. The methodology is inspired by negative sampling, which has seen success in natural language processing, where an embedding is learned by correlating and de-correlating a given input vector with output weights. Although the matrix determined by the output weights is dependent on a set of known speakers, we only use the input vectors during inference. Doing so will ensure that source separation is explicitly speaker-independent. Our approach is similar to recent deep neural network clustering and permutation-invariant training research; we use weighted spectral features and masks to augment individual speaker frequencies while filtering out other speakers. We avoid, however, the severe computational burden of other approaches with our technique. Furthermore, by training a vector space rather than combinations of different speakers or differences thereof, we avoid the so-called permutation problem during training. Our algorithm offers an intuitive, computationally efficient response to the cocktail party problem, and most importantly boasts better empirical performance than other current techniques. △ Less

Submitted 12 May, 2017; originally announced May 2017.

arXiv:1611.06962 [pdf, other]

Sampled Image Tagging and Retrieval Methods on User Generated Content

Authors: Karl Ni, Kyle Zaragoza, Charles Foster, Carmen Carrano, Barry Chen, Yonas Tesfaye, Alex Gude

Abstract: Traditional image tagging and retrieval algorithms have limited value as a result of being trained with heavily curated datasets. These limitations are most evident when arbitrary search words are used that do not intersect with training set labels. Weak labels from user generated content (UGC) found in the wild (e.g., Google Photos, FlickR, etc.) have an almost unlimited number of unique words in… ▽ More Traditional image tagging and retrieval algorithms have limited value as a result of being trained with heavily curated datasets. These limitations are most evident when arbitrary search words are used that do not intersect with training set labels. Weak labels from user generated content (UGC) found in the wild (e.g., Google Photos, FlickR, etc.) have an almost unlimited number of unique words in the metadata tags. Prior work on word embeddings successfully leveraged unstructured text with large vocabularies, and our proposed method seeks to apply similar cost functions to open source imagery. Specifically, we train a deep learning image tagging and retrieval system on large scale, user generated content (UGC) using sampling methods and joint optimization of word embeddings. By using the Yahoo! FlickR Creative Commons (YFCC100M) dataset, such an approach builds robustness to common unstructured data issues that include but are not limited to irrelevant tags, misspellings, multiple languages, polysemy, and tag imbalance. As a result, the final proposed algorithm will not only yield comparable results to state of the art in conventional image tagging, but will enable new capability to train algorithms on large, scale unstructured text in the YFCC100M dataset and outperform cited work in zero-shot capability. △ Less

Submitted 2 December, 2016; v1 submitted 21 November, 2016; originally announced November 2016.

arXiv:1511.06838 [pdf, other]

Map** Images to Sentiment Adjective Noun Pairs with Factorized Neural Nets

Authors: Takuya Narihira, Damian Borth, Stella X. Yu, Karl Ni, Trevor Darrell

Abstract: We consider the visual sentiment task of map** an image to an adjective noun pair (ANP) such as "cute baby". To capture the two-factor structure of our ANP semantics as well as to overcome annotation noise and ambiguity, we propose a novel factorized CNN model which learns separate representations for adjectives and nouns but optimizes the classification performance over their product. Our exper… ▽ More We consider the visual sentiment task of map** an image to an adjective noun pair (ANP) such as "cute baby". To capture the two-factor structure of our ANP semantics as well as to overcome annotation noise and ambiguity, we propose a novel factorized CNN model which learns separate representations for adjectives and nouns but optimizes the classification performance over their product. Our experiments on the publicly available SentiBank dataset show that our model significantly outperforms not only independent ANP classifiers on unseen ANPs and on retrieving images of novel ANPs, but also image captioning models which capture word semantics from co-occurrence of natural text; the latter turn out to be surprisingly poor at capturing the sentiment evoked by pure visual experience. That is, our factorized ANP CNN not only trains better from noisy labels, generalizes better to new images, but can also expands the ANP vocabulary on its own. △ Less

Submitted 20 November, 2015; originally announced November 2015.

arXiv:1503.01817 [pdf, other]

doi 10.1145/2812802

YFCC100M: The New Data in Multimedia Research

Authors: Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, Li-Jia Li

Abstract: We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), the largest public multimedia collection that has ever been released. The dataset contains a total of 100 million media objects, of which approximately 99.2 million are photos and 0.8 million are videos, all of which carry a Creative Commons license. Each media object in the dataset is represented by several pieces of met… ▽ More We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), the largest public multimedia collection that has ever been released. The dataset contains a total of 100 million media objects, of which approximately 99.2 million are photos and 0.8 million are videos, all of which carry a Creative Commons license. Each media object in the dataset is represented by several pieces of metadata, e.g. Flickr identifier, owner name, camera, title, tags, geo, media source. The collection provides a comprehensive snapshot of how photos and videos were taken, described, and shared over the years, from the inception of Flickr in 2004 until early 2014. In this article we explain the rationale behind its creation, as well as the implications the dataset has for science, research, engineering, and development. We further present several new challenges in multimedia research that can now be expanded upon with our dataset. △ Less

Submitted 25 April, 2016; v1 submitted 5 March, 2015; originally announced March 2015.

ACM Class: H.3.7

Journal ref: Communications of the ACM, 59(2), pp. 64-73, 2016

arXiv:1502.03409 [pdf, other]

Large-Scale Deep Learning on the YFCC100M Dataset

Authors: Karl Ni, Roger Pearce, Kofi Boakye, Brian Van Essen, Damian Borth, Barry Chen, Eric Wang

Abstract: We present a work-in-progress snapshot of learning with a 15 billion parameter deep learning network on HPC architectures applied to the largest publicly available natural image and video dataset released to-date. Recent advancements in unsupervised deep neural networks suggest that scaling up such networks in both model and training dataset size can yield significant improvements in the learning… ▽ More We present a work-in-progress snapshot of learning with a 15 billion parameter deep learning network on HPC architectures applied to the largest publicly available natural image and video dataset released to-date. Recent advancements in unsupervised deep neural networks suggest that scaling up such networks in both model and training dataset size can yield significant improvements in the learning of concepts at the highest layers. We train our three-layer deep neural network on the Yahoo! Flickr Creative Commons 100M dataset. The dataset comprises approximately 99.2 million images and 800,000 user-created videos from Yahoo's Flickr image and video sharing platform. Training of our network takes eight days on 98 GPU nodes at the High Performance Computing Center at Lawrence Livermore National Laboratory. Encouraging preliminary results and future research directions are presented and discussed. △ Less

Submitted 11 February, 2015; originally announced February 2015.

Showing 1–47 of 47 results for author: Ni, K