-
Throughput and Fairness Trade-off Balancing for UAV-Enabled Wireless Communication Systems
Authors:
Kejie Ni,
**gqing Wang,
Wenchi Cheng,
Wei Zhang
Abstract:
Given the imperative of 6G networks' ubiquitous connectivity, along with the inherent mobility and cost-effectiveness of unmanned aerial vehicles (UAVs), UAVs play a critical role within 6G wireless networks. Despite advancements in enhancing the UAV-enabled communication systems' throughput in existing studies, there remains a notable gap in addressing issues concerning user fairness and quality-…
▽ More
Given the imperative of 6G networks' ubiquitous connectivity, along with the inherent mobility and cost-effectiveness of unmanned aerial vehicles (UAVs), UAVs play a critical role within 6G wireless networks. Despite advancements in enhancing the UAV-enabled communication systems' throughput in existing studies, there remains a notable gap in addressing issues concerning user fairness and quality-of-service (QoS) provisioning and lacks an effective scheme to depict the trade-off between system throughput and user fairness. To solve the above challenges, in this paper we introduce a novel fairness control scheme for UAV-enabled wireless communication systems based on a new weighted function. First, we propose a throughput combining model based on a new weighted function with fairness considering. Second, we formulate the optimization problem to maximize the weighted sum of all users' throughput. Third, we decompose the optimization problem and propose an efficient iterative algorithm to solve it. Finally, simulation results are provided to demonstrate the considerable potential of our proposed scheme in fairness and QoS provisioning.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
DenoDet: Attention as Deformable Multi-Subspace Feature Denoising for Target Detection in SAR Images
Authors:
Yimian Dai,
Minrui Zou,
Yuxuan Li,
Xiang Li,
Kang Ni,
Jian Yang
Abstract:
Synthetic Aperture Radar (SAR) target detection has long been impeded by inherent speckle noise and the prevalence of diminutive, ambiguous targets. While deep neural networks have advanced SAR target detection, their intrinsic low-frequency bias and static post-training weights falter with coherent noise and preserving subtle details across heterogeneous terrains. Motivated by traditional SAR ima…
▽ More
Synthetic Aperture Radar (SAR) target detection has long been impeded by inherent speckle noise and the prevalence of diminutive, ambiguous targets. While deep neural networks have advanced SAR target detection, their intrinsic low-frequency bias and static post-training weights falter with coherent noise and preserving subtle details across heterogeneous terrains. Motivated by traditional SAR image denoising, we propose DenoDet, a network aided by explicit frequency domain transform to calibrate convolutional biases and pay more attention to high-frequencies, forming a natural multi-scale subspace representation to detect targets from the perspective of multi-subspace denoising. We design TransDeno, a dynamic frequency domain attention module that performs as a transform domain soft thresholding operation, dynamically denoising across subspaces by preserving salient target signals and attenuating noise. To adaptively adjust the granularity of subspace processing, we also propose a deformable group fully-connected layer (DeGroFC) that dynamically varies the group conditioned on the input features. Without bells and whistles, our plug-and-play TransDeno sets state-of-the-art scores on multiple SAR target detection datasets. The code is available at https://github.com/GrokCV/GrokSAR.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures
Authors:
Ruiyang Qin,
Zheyu Yan,
Dewen Zeng,
Zhenge Jia,
Dancheng Liu,
Jianbo Liu,
Zhi Zheng,
Ningyuan Cao,
Kai Ni,
**jun Xiong,
Yiyu Shi
Abstract:
Large Language Models (LLMs) deployed on edge devices learn through fine-tuning and updating a certain portion of their parameters. Although such learning methods can be optimized to reduce resource utilization, the overall required resources remain a heavy burden on edge devices. Instead, Retrieval-Augmented Generation (RAG), a resource-efficient LLM learning method, can improve the quality of th…
▽ More
Large Language Models (LLMs) deployed on edge devices learn through fine-tuning and updating a certain portion of their parameters. Although such learning methods can be optimized to reduce resource utilization, the overall required resources remain a heavy burden on edge devices. Instead, Retrieval-Augmented Generation (RAG), a resource-efficient LLM learning method, can improve the quality of the LLM-generated content without updating model parameters. However, the RAG-based LLM may involve repetitive searches on the profile data in every user-LLM interaction. This search can lead to significant latency along with the accumulation of user data. Conventional efforts to decrease latency result in restricting the size of saved user data, thus reducing the scalability of RAG as user data continuously grows. It remains an open question: how to free RAG from the constraints of latency and scalability on edge devices? In this paper, we propose a novel framework to accelerate RAG via Computing-in-Memory (CiM) architectures. It accelerates matrix multiplications by performing in-situ computation inside the memory while avoiding the expensive data transfer between the computing unit and memory. Our framework, Robust CiM-backed RAG (RoCR), utilizing a novel contrastive learning-based training method and noise-aware training, can enable RAG to efficiently search profile data with CiM. To the best of our knowledge, this is the first work utilizing CiM to accelerate RAG.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Automated Long Answer Grading with RiceChem Dataset
Authors:
Shashank Sonkar,
Kangqi Ni,
Lesa Tran Lu,
Kristi Kincaid,
John S. Hutchinson,
Richard G. Baraniuk
Abstract:
We introduce a new area of study in the field of educational Natural Language Processing: Automated Long Answer Grading (ALAG). Distinguishing itself from Automated Short Answer Grading (ASAG) and Automated Essay Grading (AEG), ALAG presents unique challenges due to the complexity and multifaceted nature of fact-based long answers. To study ALAG, we introduce RiceChem, a dataset derived from a col…
▽ More
We introduce a new area of study in the field of educational Natural Language Processing: Automated Long Answer Grading (ALAG). Distinguishing itself from Automated Short Answer Grading (ASAG) and Automated Essay Grading (AEG), ALAG presents unique challenges due to the complexity and multifaceted nature of fact-based long answers. To study ALAG, we introduce RiceChem, a dataset derived from a college chemistry course, featuring real student responses to long-answer questions with an average word count notably higher than typical ASAG datasets. We propose a novel approach to ALAG by formulating it as a rubric entailment problem, employing natural language inference models to verify whether each criterion, represented by a rubric item, is addressed in the student's response. This formulation enables the effective use of MNLI for transfer learning, significantly improving the performance of models on the RiceChem dataset. We demonstrate the importance of rubric-based formulation in ALAG, showcasing its superiority over traditional score-based approaches in capturing the nuances of student responses. We also investigate the performance of models in cold start scenarios, providing valuable insights into the practical deployment considerations in educational settings. Lastly, we benchmark state-of-the-art open-sourced Large Language Models (LLMs) on RiceChem and compare their results to GPT models, highlighting the increased complexity of ALAG compared to ASAG. Despite leveraging the benefits of a rubric-based approach and transfer learning from MNLI, the lower performance of LLMs on RiceChem underscores the significant difficulty posed by the ALAG task. With this work, we offer a fresh perspective on grading long, fact-based answers and introduce a new dataset to stimulate further research in this important area. Code: \url{https://github.com/luffycodes/Automated-Long-Answer-Grading}.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Combined Static Analysis and Machine Learning Prediction for Application Debloating
Authors:
Chris Porter,
Sharjeel Khan,
Kangqi Ni,
Santosh Pande
Abstract:
Software debloating can effectively thwart certain code reuse attacks by reducing attack surfaces to break gadget chains. Approaches based on static analysis enable a reduced set of functions reachable at a callsite for execution by leveraging static properties of the callgraph. This achieves low runtime overhead, but the function set is conservatively computed, negatively affecting reduction. In…
▽ More
Software debloating can effectively thwart certain code reuse attacks by reducing attack surfaces to break gadget chains. Approaches based on static analysis enable a reduced set of functions reachable at a callsite for execution by leveraging static properties of the callgraph. This achieves low runtime overhead, but the function set is conservatively computed, negatively affecting reduction. In contrast, approaches based on machine learning (ML) have much better precision and can sharply reduce function sets, leading to significant improvement in attack surface. Nevertheless, mispredictions occur in ML-based approaches. These cause overheads, and worse, there is no clear way to distinguish between mispredictions and actual attacks.
In this work, we contend that a software debloating approach that incorporates ML-based predictions at runtime is realistic in a whole application setting, and that it can achieve significant attack surface reductions beyond the state of the art. We develop a framework, Predictive Debloat with Static Guarantees (PDSG). PDSG is fully sound and works on application source code. At runtime it predicts the dynamic callee set emanating from a callsite, and to resolve mispredictions, it employs a lightweight audit based on static invariants of call chains. We deduce the invariants offline and assert that they hold at runtime when there is a misprediction. To the best of our knowledge, it achieves the highest gadget reductions among similar techniques on SPEC CPU 2017, reducing 82.5% of the total gadgets on average. It triggers misprediction checks on only 3.8% of the total predictions invoked at runtime, and it leverages Datalog to verify dynamic call sequences conform to the static call relations. It has an overhead of 8.9%, which makes the scheme attractive for practical deployments.
△ Less
Submitted 29 March, 2024;
originally announced April 2024.
-
Paving the Way for Pass Disturb Free Vertical NAND Storage via A Dedicated and String-Compatible Pass Gate
Authors:
Zijian Zhao,
Sola Woo,
Khandker Akif Aabrar,
Sharadindu Gopal Kirtania,
Zhouhang Jiang,
Shan Deng,
Yi Xiao,
Halid Mulaosmanovic,
Stefan Duenkel,
Dominik Kleimaier,
Steven Soss,
Sven Beyer,
Rajiv Joshi,
Scott Meninger,
Mohamed Mohamed,
Kijoon Kim,
Jongho Woo,
Suhwan Lim,
Kwangsoo Kim,
Wanki Kim,
Daewon Ha,
Vijaykrishnan Narayanan,
Suman Datta,
Shimeng Yu,
Kai Ni
Abstract:
In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-…
▽ More
In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-${V}_{TH}$ (LVT) state; ii) combined simulations and experimental demonstrations of dual-port design verify the disturb-free operation in a NAND string, overcoming a key challenge in single-port designs; iii) the proposed design can be incorporated in a highly scaled vertical NAND FeFET string and the pass gate can be incorporated into the existing 3D NAND with the negligible overhead of the pass gate interconnection through a global bottom pass gate contact in the substrate.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Pedagogical Alignment of Large Language Models
Authors:
Shashank Sonkar,
Kangqi Ni,
Sapana Chaudhary,
Richard G. Baraniuk
Abstract:
In this paper, we introduce the novel concept of pedagogically aligned Large Language Models (LLMs) that signifies a transformative shift in the application of LLMs within educational contexts. Rather than providing direct responses to user queries, pedagogically-aligned LLMs function as scaffolding tools, breaking complex problems into manageable subproblems and guiding students towards the final…
▽ More
In this paper, we introduce the novel concept of pedagogically aligned Large Language Models (LLMs) that signifies a transformative shift in the application of LLMs within educational contexts. Rather than providing direct responses to user queries, pedagogically-aligned LLMs function as scaffolding tools, breaking complex problems into manageable subproblems and guiding students towards the final answer through constructive feedback and hints. The objective is to equip learners with problem-solving strategies that deepen their understanding and internalization of the subject matter. Previous research in this field has primarily applied the supervised finetuning approach without framing the objective as an alignment problem, hence not employing reinforcement learning through human feedback (RLHF) methods. This study reinterprets the narrative by viewing the task through the lens of alignment and demonstrates how RLHF methods emerge naturally as a superior alternative for aligning LLM behaviour. Building on this perspective, we propose a novel approach for constructing a reward dataset specifically designed for the pedagogical alignment of LLMs. We apply three state-of-the-art RLHF algorithms and find that they outperform SFT significantly. Our qualitative analyses across model differences and hyperparameter sensitivity further validate the superiority of RLHF over SFT. Also, our study sheds light on the potential of online feedback for enhancing the performance of pedagogically-aligned LLMs, thus providing valuable insights for the advancement of these models in educational settings.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Reconfigurable Frequency Multipliers Based on Complementary Ferroelectric Transistors
Authors:
Haotian Xu,
Jianyi Yang,
Cheng Zhuo,
Thomas Kämpfe,
Kai Ni,
Xunzhao Yin
Abstract:
Frequency multipliers, a class of essential electronic components, play a pivotal role in contemporary signal processing and communication systems. They serve as crucial building blocks for generating high-frequency signals by multiplying the frequency of an input signal. However, traditional frequency multipliers that rely on nonlinear devices often require energy- and area-consuming filtering an…
▽ More
Frequency multipliers, a class of essential electronic components, play a pivotal role in contemporary signal processing and communication systems. They serve as crucial building blocks for generating high-frequency signals by multiplying the frequency of an input signal. However, traditional frequency multipliers that rely on nonlinear devices often require energy- and area-consuming filtering and amplification circuits, and emerging designs based on an ambipolar ferroelectric transistor require costly non-trivial characteristic tuning or complex technology process. In this paper, we show that a pair of standard ferroelectric field effect transistors (FeFETs) can be used to build compact frequency multipliers without aforementioned technology issues. By leveraging the tunable parabolic shape of the 2FeFET structures' transfer characteristics, we propose four reconfigurable frequency multipliers, which can switch between signal transmission and frequency doubling. Furthermore, based on the 2FeFET structures, we propose four frequency multipliers that realize triple, quadruple frequency modes, elucidating a scalable methodology to generate more multiplication harmonics of the input frequency. Performance metrics such as maximum operating frequency, power, etc., are evaluated and compared with existing works. We also implement a practical case of frequency modulation scheme based on the proposed reconfigurable multipliers without additional devices. Our work provides a novel path of scalable and reconfigurable frequency multiplier designs based on devices that have characteristics similar to FeFETs, and show that FeFETs are a promising candidate for signal processing and communication systems in terms of maximum operating frequency and power.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Low Power and Temperature-Resilient Compute-In-Memory Based on Subthreshold-FeFET
Authors:
Yifei Zhou,
Xuchu Huang,
Jianyi Yang,
Kai Ni,
Hussam Amrouch,
Cheng Zhuo,
Xunzhao Yin
Abstract:
Compute-in-memory (CiM) is a promising solution for addressing the challenges of artificial intelligence (AI) and the Internet of Things (IoT) hardware such as 'memory wall' issue. Specifically, CiM employing nonvolatile memory (NVM) devices in a crossbar structure can efficiently accelerate multiply-accumulation (MAC) computation, a crucial operator in neural networks among various AI models. Low…
▽ More
Compute-in-memory (CiM) is a promising solution for addressing the challenges of artificial intelligence (AI) and the Internet of Things (IoT) hardware such as 'memory wall' issue. Specifically, CiM employing nonvolatile memory (NVM) devices in a crossbar structure can efficiently accelerate multiply-accumulation (MAC) computation, a crucial operator in neural networks among various AI models. Low power CiM designs are thus highly desired for further energy efficiency optimization on AI models. Ferroelectric FET (FeFET), an emerging device, is attractive for building ultra-low power CiM array due to CMOS compatibility, high ION/IOFF ratio, etc. Recent studies have explored FeFET based CiM designs that achieve low power consumption. Nevertheless, subthreshold-operated FeFETs, where the operating voltages are scaled down to the subthreshold region to reduce array power consumption, are particularly vulnerable to temperature drift, leading to accuracy degradation. To address this challenge, we propose a temperature-resilient 2T-1FeFET CiM design that performs MAC operations reliably at subthreahold region from 0 to 85 Celsius, while consuming ultra-low power. Benchmarked against the VGG neural network architecture running the CIFAR-10 dataset, the proposed 2T-1FeFET CiM design achieves 89.45% CIFAR-10 test accuracy. Compared to previous FeFET based CiM designs, it exhibits immunity to temperature drift at an 8-bit wordlength scale, and achieves better energy efficiency with 2866 TOPS/W.
△ Less
Submitted 10 January, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Variation-Resilient FeFET-Based In-Memory Computing Leveraging Probabilistic Deep Learning
Authors:
Bibhas Manna,
Arnob Saha,
Zhouhang Jiang,
Kai Ni,
Abhronil Sengupta
Abstract:
Reliability issues stemming from device level non-idealities of non-volatile emerging technologies like ferroelectric field-effect transistors (FeFET), especially at scaled dimensions, cause substantial degradation in the accuracy of In-Memory crossbar-based AI systems. In this work, we present a variation-aware design technique to characterize the device level variations and to mitigate their imp…
▽ More
Reliability issues stemming from device level non-idealities of non-volatile emerging technologies like ferroelectric field-effect transistors (FeFET), especially at scaled dimensions, cause substantial degradation in the accuracy of In-Memory crossbar-based AI systems. In this work, we present a variation-aware design technique to characterize the device level variations and to mitigate their impact on hardware accuracy employing a Bayesian Neural Network (BNN) approach. An effective conductance variation model is derived from the experimental measurements of cycle-to-cycle (C2C) and device-to-device (D2D) variations performed on FeFET devices fabricated using 28 nm high-$k$ metal gate technology. The variations were found to be a function of different conductance states within the given programming range, which sharply contrasts earlier efforts where a fixed variation dispersion was considered for all conductance values. Such variation characteristics formulated for three different device sizes at different read voltages were provided as prior variation information to the BNN to yield a more exact and reliable inference. Near-ideal accuracy for shallow networks (MLP5 and LeNet models) on the MNIST dataset and limited accuracy decline by $\sim$3.8-16.1% for deeper AlexNet models on CIFAR10 dataset under a wide range of variations corresponding to different device sizes and read voltages, demonstrates the efficacy of our proposed device-algorithm co-design technique.
△ Less
Submitted 13 March, 2024; v1 submitted 24 December, 2023;
originally announced December 2023.
-
SEE-MCAM: Scalable Multi-bit FeFET Content Addressable Memories for Energy Efficient Associative Search
Authors:
Shengxi Shou,
Che-Kai Liu,
Sanggeon Yun,
Zishen Wan,
Kai Ni,
Mohsen Imani,
X. Sharon Hu,
Jianyi Yang,
Cheng Zhuo,
Xunzhao Yin
Abstract:
In this work, we propose SEE-MCAM, scalable and compact multi-bit CAM (MCAM) designs that utilize the three-terminal ferroelectric FET (FeFET) as the proxy. By exploiting the multi-level-cell characteristics of FeFETs, our proposed SEE-MCAM designs enable multi-bit associative search functions and achieve better energy efficiency and performance than existing FeFET-based CAM designs. We validated…
▽ More
In this work, we propose SEE-MCAM, scalable and compact multi-bit CAM (MCAM) designs that utilize the three-terminal ferroelectric FET (FeFET) as the proxy. By exploiting the multi-level-cell characteristics of FeFETs, our proposed SEE-MCAM designs enable multi-bit associative search functions and achieve better energy efficiency and performance than existing FeFET-based CAM designs. We validated the functionality of our proposed designs by achieving 3 bits per cell CAM functionality, resulting in 3x improvement in storage density. The area per bit of the proposed SEE-MCAM cell is 8% of the conventional CMOS CAM. We thoroughly investigated the scalability and robustness of the proposed design. Evaluation results suggest that the proposed 2FeFET-1T SEE-MCAM achieves 9.8x more energy efficiency and 1.6x less search latency compared to the CMOS CAM, respectively. When compared to existing MCAM designs, the proposed SEE-MCAM can achieve 8.7x and 4.9x more energy efficiency than ReRAM-based and FeFET-based MCAMs, respectively. Benchmarking results show that our approach provides up to 3 orders of magnitude improvement in speedup and energy efficiency over a GPU implementation in accelerating a novel quantized hyperdimensional computing (HDC) application.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
A Ferroelectric Compute-in-Memory Annealer for Combinatorial Optimization Problems
Authors:
Xunzhao Yin,
Yu Qian,
Alptekin Vardar,
Marcel Gunther,
Franz Muller,
Nellie Laleni,
Zijian Zhao,
Zhouhang Jiang,
Zhiguo Shi,
Yiyu Shi,
Xiao Gong,
Cheng Zhuo,
Thomas Kampfe,
Kai Ni
Abstract:
Computationally hard combinatorial optimization problems (COPs) are ubiquitous in many applications, including logistical planning, resource allocation, chip design, drug explorations, and more. Due to their critical significance and the inability of conventional hardware in efficiently handling scaled COPs, there is a growing interest in develo** computing hardware tailored specifically for COP…
▽ More
Computationally hard combinatorial optimization problems (COPs) are ubiquitous in many applications, including logistical planning, resource allocation, chip design, drug explorations, and more. Due to their critical significance and the inability of conventional hardware in efficiently handling scaled COPs, there is a growing interest in develo** computing hardware tailored specifically for COPs, including digital annealers, dynamical Ising machines, and quantum/photonic systems. However, significant hurdles still remain, such as the memory access issue, the system scalability and restricted applicability to certain types of COPs, and VLSI-incompatibility, respectively. Here, a ferroelectric field effect transistor (FeFET) based compute-in-memory (CiM) annealer is proposed. After converting COPs into quadratic unconstrained binary optimization (QUBO) formulations, a hardware-algorithm co-design is conducted, yielding an energy-efficient, versatile, and scalable hardware for COPs. To accelerate the core vector-matrix-vector (VMV) multiplication of QUBO formulations, a FeFET based CiM array is exploited, which can accelerate the intended operation in-situ due to its unique three-terminal structure. In particular, a lossless compression technique is proposed to prune typically sparse QUBO matrix to reduce hardware cost. Furthermore, a multi-epoch simulated annealing (MESA) algorithm is proposed to replace conventional simulated annealing for its faster convergence and better solution quality. The effectiveness of the proposed techniques is validated through the utilization of developed chip prototypes for successfully solving graph coloring problem, indicating great promise of FeFET CiM annealer in solving general COPs.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Embedding Security into Ferroelectric FET Array via In-Situ Memory Operation
Authors:
Yixin Xu,
Yi Xiao,
Zijian Zhao,
Franz Müller,
Alptekin Vardar,
Xiao Gong,
Sumitha George,
Thomas Kämpfe,
Vijaykrishnan Narayanan,
Kai Ni
Abstract:
Non-volatile memories (NVMs) have the potential to reshape next-generation memory systems because of their promising properties of near-zero leakage power consumption, high density and non-volatility. However, NVMs also face critical security threats that exploit the non-volatile property. Compared to volatile memory, the capability of retaining data even after power down makes NVM more vulnerable…
▽ More
Non-volatile memories (NVMs) have the potential to reshape next-generation memory systems because of their promising properties of near-zero leakage power consumption, high density and non-volatility. However, NVMs also face critical security threats that exploit the non-volatile property. Compared to volatile memory, the capability of retaining data even after power down makes NVM more vulnerable. Existing solutions to address the security issues of NVMs are mainly based on Advanced Encryption Standard (AES), which incurs significant performance and power overhead. In this paper, we propose a lightweight memory encryption/decryption scheme by exploiting in-situ memory operations with negligible overhead. To validate the feasibility of the encryption/decryption scheme, device-level and array-level experiments are performed using ferroelectric field effect transistor (FeFET) as an example NVM without loss of generality. Besides, a comprehensive evaluation is performed on a 128x128 FeFET AND-type memory array in terms of area, latency, power and throughput. Compared with the AES-based scheme, our scheme shows around 22.6x/14.1x increase in encryption/decryption throughput with negligible power penalty. Furthermore, we evaluate the performance of our scheme over the AES-based scheme when deploying different neural network workloads. Our scheme yields significant latency reduction by 90% on average for encryption and decryption processes.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Powering Disturb-Free Reconfigurable Computing and Tunable Analog Electronics with Dual-Port Ferroelectric FET
Authors:
Zijian Zhao,
Shan Deng,
Swetaki Chatterjee,
Zhouhang Jiang,
Muhammad Shaffatul Islam,
Yi Xiao,
Yixin Xu,
Scott Meninger,
Mohamed Mohamed,
Rajiv Joshi,
Yogesh Singh Chauhan,
Halid Mulaosmanovic,
Stefan Duenkel,
Dominik Kleimaier,
Sven Beyer,
Hussam Amrouch,
Vijaykrishnan Narayanan,
Kai Ni
Abstract:
Single-port ferroelectric FET (FeFET) that performs write and read operations on the same electrical gate prevents its wide application in tunable analog electronics and suffers from read disturb, especially to the high-threshold voltage (VTH) state as the retention energy barrier is reduced by the applied read bias. To address both issues, we propose to adopt a read disturb-free dual-port FeFET w…
▽ More
Single-port ferroelectric FET (FeFET) that performs write and read operations on the same electrical gate prevents its wide application in tunable analog electronics and suffers from read disturb, especially to the high-threshold voltage (VTH) state as the retention energy barrier is reduced by the applied read bias. To address both issues, we propose to adopt a read disturb-free dual-port FeFET where write is performed on the gate featuring a ferroelectric layer and the read is done on a separate gate featuring a non-ferroelectric dielectric. Combining the unique structure and the separate read gate, read disturb is eliminated as the applied field is aligned with polarization in the high-VTH state and thus improving its stability, while it is screened by the channel inversion charge and exerts no negative impact on the low-VTH state stability. Comprehensive theoretical and experimental validation have been performed on fully-depleted silicon-on-insulator (FDSOI) FeFETs integrated on 22 nm platform, which intrinsically has dual ports with its buried oxide layer acting as the non-ferroelectric dielectric. Novel applications that can exploit the proposed dual-port FeFET are proposed and experimentally demonstrated for the first time, including FPGA that harnesses its read disturb-free feature and tunable analog electronics (e.g., frequency tunable ring oscillator in this work) leveraging the separated write and read paths.
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
X-ray Recognition: Patient identification from X-rays using a contrastive objective
Authors:
Hao Liang,
Kevin Ni,
Guha Balakrishnan
Abstract:
Recent research demonstrates that deep learning models are capable of precisely extracting bio-information (e.g. race, gender and age) from patients' Chest X-Rays (CXRs). In this paper, we further show that deep learning models are also surprisingly accurate at recognition, i.e., distinguishing CXRs belonging to the same patient from those belonging to different patients. These findings suggest po…
▽ More
Recent research demonstrates that deep learning models are capable of precisely extracting bio-information (e.g. race, gender and age) from patients' Chest X-Rays (CXRs). In this paper, we further show that deep learning models are also surprisingly accurate at recognition, i.e., distinguishing CXRs belonging to the same patient from those belonging to different patients. These findings suggest potential privacy considerations that the medical imaging community should consider with the proliferation of large public CXR databases.
△ Less
Submitted 28 April, 2023;
originally announced May 2023.
-
Visualizing chest X-ray dataset biases using GANs
Authors:
Hao Liang,
Kevin Ni,
Guha Balakrishnan
Abstract:
Recent work demonstrates that images from various chest X-ray datasets contain visual features that are strongly correlated with protected demographic attributes like race and gender. This finding raises issues of fairness, since some of these factors may be used by downstream algorithms for clinical predictions. In this work, we propose a framework, using generative adversarial networks (GANs), t…
▽ More
Recent work demonstrates that images from various chest X-ray datasets contain visual features that are strongly correlated with protected demographic attributes like race and gender. This finding raises issues of fairness, since some of these factors may be used by downstream algorithms for clinical predictions. In this work, we propose a framework, using generative adversarial networks (GANs), to visualize what features are most different between X-rays belonging to two demographic subgroups.
△ Less
Submitted 5 September, 2023; v1 submitted 28 April, 2023;
originally announced May 2023.
-
Voltage-controlled Cryogenic Boolean Logic Family Based on Ferroelectric SQUID
Authors:
Shamiul Alam,
Md Shafayat Hossain,
Kai Ni,
Vijaykrishnan Narayanan,
Ahmedullah Aziz
Abstract:
The recent progress in quantum computing and space exploration led to a surge in interest in cryogenic electronics. Superconducting devices such as Josephson junction, Josephson field effect transistor, cryotron, and superconducting quantum interference device (SQUID) are traditionally used to build cryogenic logic gates. However, due to the superconducting nature, gate-voltage-based control of th…
▽ More
The recent progress in quantum computing and space exploration led to a surge in interest in cryogenic electronics. Superconducting devices such as Josephson junction, Josephson field effect transistor, cryotron, and superconducting quantum interference device (SQUID) are traditionally used to build cryogenic logic gates. However, due to the superconducting nature, gate-voltage-based control of these devices is extremely difficult. Even more challenging is to cascade the logic gates because most of these devices require current bias for their operation. Therefore, these devices are not as convenient as the semiconducting transistors to design logic gates. Here, to overcome these challenges, we propose a ferroelectric SQUID (FeSQUID) based voltage-controlled logic gates. FeSQUID exhibits two different critical current levels for two different voltage-switchable polarization states of the ferroelectric. We utilize the polarization-dependent (hence, voltage-controllable) superconducting to resistive switching of FeSQUID to design Boolean logic gates such as Copy, NOT, AND, and OR gates. The operations of these gates are verified using a Verilog-A-based compact model of FeSQUID. Finally, to demonstrate the fanning out capability of FeSQUID-based logic family, we simulate a 2-input XOR gate using FeSQUID-based NOT, AND, and OR gates. Together with the ongoing progress on FeSQUID-based non-volatile memory, our designed FeSQUID-based logic family will enable all-FeSQUID based cryogenic computer, ensure minimum mismatch between logic and memory blocks in terms of speed, power consumption, and fabrication process.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
Eliminating Leakage in Volatile Memory with Anti-Ferroelectric Transistors
Authors:
Hongtao Zhong,
Zijie Zheng,
Leming Jiao,
Zuopu Zhou,
Chen Sun,
Xiaoyang Ma,
Vijaykrishnan Narayanan,
Huazhong Yang,
Kai Ni,
Xiao Gong,
Xueqing Li
Abstract:
Cache serves as a temporary data memory module in many general-purpose processors and domain-specific accelerators. Its density, power, speed, and reliability play a critical role in enhancing the overall system performance and quality of service. Conventional volatile memories, including static random-access memory (SRAM) and embedded dynamic random-access memory (eDRAM) in the complementary meta…
▽ More
Cache serves as a temporary data memory module in many general-purpose processors and domain-specific accelerators. Its density, power, speed, and reliability play a critical role in enhancing the overall system performance and quality of service. Conventional volatile memories, including static random-access memory (SRAM) and embedded dynamic random-access memory (eDRAM) in the complementary metal-oxide-semiconductor technology, have high performance and good reliability. However, the inherent leakage in both SRAM and eDRAM hinders further improvement towards smaller feature sizes and higher energy efficiency. Although the emerging nonvolatile memories can eliminate the leakage efficiently, the penalties of lower speed and degraded reliability are significant. This article reveals a new opportunity towards leakage-free volatile static memory beyond the known paradigms of existing volatile and nonvolatile memories. By engineering a double-well energy landscape with the assistance of a clam** voltage bias, leakage-free and refresh-free state retention of volatile memory is achieved for the first time. This new memory is highlighted by both the ultra-low leakage of nonvolatile memories and the speed, energy, and reliability advantages of volatile memories. A proof-of-concept memory is demonstrated using in-house anti-ferroelectric field-effect transistors (AFeFETs), delivering an extrapolated endurance of about 1012 cycles, a retention time of over 10 years, and no subthreshold channel leakage current. Such a new concept of AFeFET-based memory enables an improved balance between density, power, and reliability beyond all existing memory solutions.
△ Less
Submitted 1 February, 2023; v1 submitted 9 December, 2022;
originally announced December 2022.
-
Ferroelectric FET based Context-Switching FPGA Enabling Dynamic Reconfiguration for Adaptive Deep Learning Machines
Authors:
Yixin Xu,
Zijian Zhao,
Yi Xiao,
Tongguang Yu,
Halid Mulaosmanovic,
Dominik Kleimaier,
Stefan Duenkel,
Sven Beyer,
Xiao Gong,
Rajiv Joshi,
X. Sharon Hu,
Shixian Wen,
Amanda Sofie Rios,
Kiran Lekkala,
Laurent Itti,
Eric Homan,
Sumitha George,
Vijaykrishnan Narayanan,
Kai Ni
Abstract:
Field Programmable Gate Array (FPGA) is widely used in acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the tradeoff between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. In this paper, we perfor…
▽ More
Field Programmable Gate Array (FPGA) is widely used in acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the tradeoff between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. In this paper, we perform technology-circuit-architecture co-design to break this tradeoff with no additional area cost and lower power consumption compared with conventional designs while providing dynamic reconfiguration, which can hide the reconfiguration time behind the execution time. Leveraging the intrinsic transistor structure and non-volatility of ferroelectric FET (FeFET), compact FPGA primitives are proposed and experimentally verified, including 1FeFET look-up table (LUT) cell, 1FeFET routing cell for connection blocks (CBs) and switch boxes (SBs). To support dynamic reconfiguration, two local copies of primitives are placed in parallel, which enables loading of arbitrary configuration without interrupting the active configuration execution. A comprehensive evaluation shows that compared with the SRAM-based FPGA, our dynamic reconfiguration design shows 63.0%/71.1% reduction in LUT/CB area and 82.7%/53.6% reduction in CB/SB power consumption with minimal penalty in the critical path delay (9.6%). We further implement a Super-Sub network model to show the benefit from the context-switching capability of our design. We also evaluate the timing performance of our design over conventional FPGA in various application scenarios. In one scenario that users switch between two preloaded configurations, our design yields significant time saving by 78.7% on average. In the other scenario of implementing multiple configurations with dynamic reconfiguration, our design offers time saving of 20.3% on average.
△ Less
Submitted 30 November, 2022;
originally announced December 2022.
-
Ground Plane Matters: Picking Up Ground Plane Prior in Monocular 3D Object Detection
Authors:
Fan Yang,
Xinhao Xu,
Hui Chen,
Yuchen Guo,
Jungong Han,
Kai Ni,
Guiguang Ding
Abstract:
The ground plane prior is a very informative geometry clue in monocular 3D object detection (M3OD). However, it has been neglected by most mainstream methods. In this paper, we identify two key factors that limit the applicability of ground plane prior: the projection point localization issue and the ground plane tilt issue. To pick up the ground plane prior for M3OD, we propose a Ground Plane Enh…
▽ More
The ground plane prior is a very informative geometry clue in monocular 3D object detection (M3OD). However, it has been neglected by most mainstream methods. In this paper, we identify two key factors that limit the applicability of ground plane prior: the projection point localization issue and the ground plane tilt issue. To pick up the ground plane prior for M3OD, we propose a Ground Plane Enhanced Network (GPENet) which resolves both issues at one go. For the projection point localization issue, instead of using the bottom vertices or bottom center of the 3D bounding box (BBox), we leverage the object's ground contact points, which are explicit pixels in the image and easy for the neural network to detect. For the ground plane tilt problem, our GPENet estimates the horizon line in the image and derives a novel mathematical expression to accurately estimate the ground plane equation. An unsupervised vertical edge mining algorithm is also proposed to address the occlusion of the horizon line. Furthermore, we design a novel 3D bounding box deduction method based on a dynamic back projection algorithm, which could take advantage of the accurate contact points and the ground plane equation. Additionally, using only M3OD labels, contact point and horizon line pseudo labels can be easily generated with NO extra data collection and label annotation cost. Extensive experiments on the popular KITTI benchmark show that our GPENet can outperform other methods and achieve state-of-the-art performance, well demonstrating the effectiveness and the superiority of the proposed approach. Moreover, our GPENet works better than other methods in cross-dataset evaluation on the nuScenes dataset. Our code and models will be published.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
Hybrid Stochastic Synapses Enabled by Scaled Ferroelectric Field-effect Transistors
Authors:
A N M Nafiul Islam,
Arnob Saha,
Zhouhang Jiang,
Kai Ni,
Abhronil Sengupta
Abstract:
Achieving brain-like density and performance in neuromorphic computers necessitates scaling down the size of nanodevices emulating neuro-synaptic functionalities. However, scaling nanodevices results in reduction of programming resolution and emergence of stochastic non-idealities. While prior work has mainly focused on binary transitions, in this work we leverage the stochastic switching of a thr…
▽ More
Achieving brain-like density and performance in neuromorphic computers necessitates scaling down the size of nanodevices emulating neuro-synaptic functionalities. However, scaling nanodevices results in reduction of programming resolution and emergence of stochastic non-idealities. While prior work has mainly focused on binary transitions, in this work we leverage the stochastic switching of a three-state ferroelectric field effect transistor (FeFET) to implement a long-term and short-term 2-tier stochastic synaptic memory with a single device. Experimental measurements are performed on a scaled 28nm high-$k$ metal gate technology-based device to develop a probabilistic model of the hybrid stochastic synapse. In addition to the advantage of ultra-low programming energies afforded by scaling, our hardware-algorithm co-design analysis reveals the efficacy of the 2-tier memory in comparison to binary stochastic synapses in on-chip learning tasks -- paving the way for algorithms exploiting multi-state devices with probabilistic transitions beyond deterministic ones.
△ Less
Submitted 10 March, 2023; v1 submitted 27 September, 2022;
originally announced September 2022.
-
A Homogeneous Processing Fabric for Matrix-Vector Multiplication and Associative Search Using Ferroelectric Time-Domain Compute-in-Memory
Authors:
Xunzhao Yin,
Qingrong Huang,
Franz Müller,
Shan Deng,
Alptekin Vardar,
Sourav De,
Zhouhang Jiang,
Mohsen Imani,
Cheng Zhuo,
Thomas Kämpfe,
Kai Ni
Abstract:
In this work, we propose a ferroelectric FET(FeFET) time-domain compute-in-memory (TD-CiM) array as a homogeneous processing fabric for binary multiplication-accumulation (MAC) and content addressable memory (CAM). We demonstrate that: i) the XOR(XNOR)/AND logic function can be realized using a single cell composed of 2FeFETs connected in series; ii) a two-phase computation in an inverter chain wi…
▽ More
In this work, we propose a ferroelectric FET(FeFET) time-domain compute-in-memory (TD-CiM) array as a homogeneous processing fabric for binary multiplication-accumulation (MAC) and content addressable memory (CAM). We demonstrate that: i) the XOR(XNOR)/AND logic function can be realized using a single cell composed of 2FeFETs connected in series; ii) a two-phase computation in an inverter chain with each stage featuring the XOR/AND cell to control the associated capacitor loading and the computation results of binary MAC and CAM are reflected in the chain output signal delay, illustrating full digital compatibility; iii) comprehensive theoretical and experimental validation of the proposed 2FeFET cell and inverter delay chains and their robustness against FeFET variation; iv) the homogeneous processing fabric is applied in hyperdimensional computing to show dynamic and fine-grain resource allocation to accommodate different tasks requiring varying demands over the binary MAC and CAM resources.
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
Ferroelectric FET-based strong physical unclonable function: a low-power, high-reliable and reconfigurable solution for Internet-of-Things security
Authors:
Xinrui Guo,
Xiaoyang Ma,
Franz Muller,
Kai Ni,
Thomas Kampfe,
Yongpan Liu,
Vijaykrishnan Narayanan,
Xueqing Li
Abstract:
Hardware security has been a key concern in modern information technologies. Especially, as the number of Internet-of-Things (IoT) devices grows rapidly, to protect the device security with low-cost security primitives becomes essential, among which Physical Unclonable Function (PUF) is a widely-used solution. In this paper, we propose the first FeFET-based strong PUF exploiting the cycle-to-cycle…
▽ More
Hardware security has been a key concern in modern information technologies. Especially, as the number of Internet-of-Things (IoT) devices grows rapidly, to protect the device security with low-cost security primitives becomes essential, among which Physical Unclonable Function (PUF) is a widely-used solution. In this paper, we propose the first FeFET-based strong PUF exploiting the cycle-to-cycle (C2C) variation of FeFETs as the entropy source. Based on the experimental measurements, the proposed PUF shows satisfying performance including high uniformity, uniqueness, reconfigurability and reliability. To resist machine-learning attack, XOR structure was introduced, and simulations show that our proposed PUF has similar resistance to existing attack models with traditional arbiter PUFs. Furthermore, our design is shown to be power-efficient, and highly robust to write voltage, temperature and device size, which makes it a competitive security solution for Internet-of-Things edge devices.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.
-
Intellectual Property Evaluation Utilizing Machine Learning
Authors:
**xin Ding,
Yuxin Huang,
Keyang Ni,
Xueyao Wang,
Yinxiao Wang,
Yucheng Wang
Abstract:
Intellectual properties is increasingly important in the economic development. To solve the pain points by traditional methods in IP evaluation, we are develo** a new technology with machine learning as the core. We have built an online platform and will expand our business in the Greater Bay Area with plans.
Intellectual properties is increasingly important in the economic development. To solve the pain points by traditional methods in IP evaluation, we are develo** a new technology with machine learning as the core. We have built an online platform and will expand our business in the Greater Bay Area with plans.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Semantic Interleaving Global Channel Attention for Multilabel Remote Sensing Image Classification
Authors:
Yongkun Liu,
Kesong Ni,
Yuhan Zhang,
Lijian Zhou,
Kun Zhao
Abstract:
Multi-Label Remote Sensing Image Classification (MLRSIC) has received increasing research interest. Taking the cooccurrence relationship of multiple labels as additional information helps to improve the performance of this task. Current methods focus on using it to constrain the final feature output of a Convolutional Neural Network (CNN). On the one hand, these methods do not make full use of lab…
▽ More
Multi-Label Remote Sensing Image Classification (MLRSIC) has received increasing research interest. Taking the cooccurrence relationship of multiple labels as additional information helps to improve the performance of this task. Current methods focus on using it to constrain the final feature output of a Convolutional Neural Network (CNN). On the one hand, these methods do not make full use of label correlation to form feature representation. On the other hand, they increase the label noise sensitivity of the system, resulting in poor robustness. In this paper, a novel method called Semantic Interleaving Global Channel Attention (SIGNA) is proposed for MLRSIC. First, the label co-occurrence graph is obtained according to the statistical information of the data set. The label co-occurrence graph is used as the input of the Graph Neural Network (GNN) to generate optimal feature representations. Then, the semantic features and visual features are interleaved, to guide the feature expression of the image from the original feature space to the semantic feature space with embedded label relations. SIGNA triggers global attention of feature maps channels in a new semantic feature space to extract more important visual features. Multihead SIGNA based feature adaptive weighting networks are proposed to act on any layer of CNN in a plug-and-play manner. For remote sensing images, better classification performance can be achieved by inserting CNN into the shallow layer. We conduct extensive experimental comparisons on three data sets: UCM data set, AID data set, and DFC15 data set. Experimental results demonstrate that the proposed SIGNA achieves superior classification performance compared to state-of-the-art (SOTA) methods. It is worth mentioning that the codes of this paper will be open to the community for reproducibility research. Our codes are available at https://github.com/kyle-one/SIGNA.
△ Less
Submitted 2 January, 2023; v1 submitted 4 August, 2022;
originally announced August 2022.
-
COSIME: FeFET based Associative Memory for In-Memory Cosine Similarity Search
Authors:
Che-Kai Liu,
Haobang Chen,
Mohsen Imani,
Kai Ni,
Arman Kazemi,
Ann Franchesca Laguna,
Michael Niemier,
Xiaobo Sharon Hu,
Liang Zhao,
Cheng Zhuo,
Xunzhao Yin
Abstract:
In a number of machine learning models, an input query is searched across the trained class vectors to find the closest feature class vector in cosine similarity metric. However, performing the cosine similarities between the vectors in Von-Neumann machines involves a large number of multiplications, Euclidean normalizations and division operations, thus incurring heavy hardware energy and latency…
▽ More
In a number of machine learning models, an input query is searched across the trained class vectors to find the closest feature class vector in cosine similarity metric. However, performing the cosine similarities between the vectors in Von-Neumann machines involves a large number of multiplications, Euclidean normalizations and division operations, thus incurring heavy hardware energy and latency overheads. Moreover, due to the memory wall problem that presents in the conventional architecture, frequent cosine similarity-based searches (CSSs) over the class vectors requires a lot of data movements, limiting the throughput and efficiency of the system. To overcome the aforementioned challenges, this paper introduces COSIME, an general in-memory associative memory (AM) engine based on the ferroelectric FET (FeFET) device for efficient CSS. By leveraging the one-transistor AND gate function of FeFET devices, current-based translinear analog circuit and winner-take-all (WTA) circuitry, COSIME can realize parallel in-memory CSS across all the entries in a memory block, and output the closest word to the input query in cosine similarity metric. Evaluation results at the array level suggest that the proposed COSIME design achieves 333X and 90.5X latency and energy improvements, respectively, and realizes better classification accuracy when compared with an AM design implementing approximated CSS. The proposed in-memory computing fabric is evaluated for an HDC problem, showcasing that COSIME can achieve on average 47.1X and 98.5X speedup and energy efficiency improvements compared with an GPU implementation.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
CMOS-Compatible Ising Machines built using Bistable Latches Coupled through Ferroelectric Transistor Arrays
Authors:
Antik Mallick,
Zijian Zhao,
Mohammad Khairul Bashar,
Shamiul Alam,
Md Mazharul Islam,
Yi Xiao,
Yixin Xu,
Ahmedullah Aziz,
Vijaykrishnan Narayanan,
Kai Ni,
Nikhil Shukla
Abstract:
Realizing compact and scalable Ising machines that are compatible with CMOS-process technology is crucial to the effectiveness and practicality of using such hardware platforms for accelerating computationally intractable problems. Besides the need for realizing compact Ising spins, the implementation of the coupling network, which describes the spin interaction, is also a potential bottleneck in…
▽ More
Realizing compact and scalable Ising machines that are compatible with CMOS-process technology is crucial to the effectiveness and practicality of using such hardware platforms for accelerating computationally intractable problems. Besides the need for realizing compact Ising spins, the implementation of the coupling network, which describes the spin interaction, is also a potential bottleneck in the scalability of such platforms. Therefore, in this work, we propose an Ising machine platform that exploits the novel behavior of compact bi-stable CMOS-latches (cross-coupled inverters) as classical Ising spins interacting through highly scalable and CMOS-process compatible ferroelectric-HfO2-based Ferroelectric FETs (FeFETs) which act as coupling elements. We experimentally demonstrate the prototype building blocks of this system, and evaluate the behavior of the scaled system using simulations. We project that the proposed architecture can compute Ising solutions with an efficiency of ~1.04 x 10^8 solutions/W/second. Our work not only provides a pathway to realizing CMOS-compatible designs but also to overcoming their scaling challenges.
△ Less
Submitted 29 May, 2022;
originally announced May 2022.
-
An Ultra-Compact Single FeFET Binary and Multi-Bit Associative Search Engine
Authors:
Xunzhao Yin,
Franz Müller,
Qingrong Huang,
Chao Li,
Mohsen Imani,
Zeyu Yang,
Jiahao Cai,
Maximilian Lederer,
Ricardo Olivo,
Nellie Laleni,
Shan Deng,
Zijian Zhao,
Cheng Zhuo,
Thomas Kämpfe,
Kai Ni
Abstract:
Content addressable memory (CAM) is widely used in associative search tasks for its highly parallel pattern matching capability. To accommodate the increasingly complex and data-intensive pattern matching tasks, it is critical to keep improving the CAM density to enhance the performance and area efficiency. In this work, we demonstrate: i) a novel ultra-compact 1FeFET CAM design that enables paral…
▽ More
Content addressable memory (CAM) is widely used in associative search tasks for its highly parallel pattern matching capability. To accommodate the increasingly complex and data-intensive pattern matching tasks, it is critical to keep improving the CAM density to enhance the performance and area efficiency. In this work, we demonstrate: i) a novel ultra-compact 1FeFET CAM design that enables parallel associative search and in-memory hamming distance calculation; ii) a multi-bit CAM for exact search using the same CAM cell; iii) compact device designs that integrate the series resistor current limiter into the intrinsic FeFET structure to turn the 1FeFET1R into an effective 1FeFET cell; iv) a successful 2-step search operation and a sufficient sensing margin of the proposed binary and multi-bit 1FeFET1R CAM array with sizes of practical interests in both experiments and simulations, given the existing unoptimized FeFET device variation; v) 89.9x speedup and 66.5x energy efficiency improvement over the state-of-the art alignment tools on GPU in accelerating genome pattern matching applications through the hyperdimensional computing paradigm.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Hardware Functional Obfuscation With Ferroelectric Active Interconnects
Authors:
Tonggunag Yu,
Yixin Xu,
Shan Deng,
Zijian Zhao,
Nicolas Jao,
You Sung Kim,
Stefan Duenkel,
Sven Beyer,
Kai Ni,
Sumitha George,
Vijaykrishnan Narayanan
Abstract:
Camouflaging gate techniques are typically used in hardware security to prevent reverse engineering. Layout level camouflaging by adding dummy contacts ensures some level of protection against extracting the correct netlist. Threshold voltage manipulation for multi-functional logic with identical layouts has also been introduced for functional obfuscation. All these techniques are implemented at t…
▽ More
Camouflaging gate techniques are typically used in hardware security to prevent reverse engineering. Layout level camouflaging by adding dummy contacts ensures some level of protection against extracting the correct netlist. Threshold voltage manipulation for multi-functional logic with identical layouts has also been introduced for functional obfuscation. All these techniques are implemented at the expense of circuit-complexity and with significant area, energy, and delay penalty. In this paper, we propose an efficient hardware encryption technique with minimal complexity and overheads based on ferroelectric field-effect transistor (FeFET) active interconnects. The active interconnect provides run-time reconfigurable inverter-buffer logic by utilizing the threshold voltage programmability of the FeFETs. Our method utilizes only two FeFETs and an inverter to realize the masking function compared to recent reconfigurable logic gate implementations using several FeFETs and complex differential logic. We fabricate the proposed circuit and demonstrate the functionality. Judicious placement of the proposed logic in the IC makes it acts as a hardware encryption key and enables encoding and decoding of the functional output without affecting the critical path timing delay. Also, we achieve comparable encryption probability with a limited number of encryption units. In addition, we show a peripheral programming scheme for reconfigurable logic by reusing the existing scan chain logic, hence obviating the need for specialized programming logic and circuitry for keybit distribution. Our analysis shows an average encryption probability of 97.43% with an increase of 2.24%/ 3.67% delay for the most critical path/ sum of 100 critical paths delay for ISCAS85 benchmarks.
△ Less
Submitted 25 April, 2022; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Deep Random Forest with Ferroelectric Analog Content Addressable Memory
Authors:
Xunzhao Yin,
Franz Müller,
Ann Franchesca Laguna,
Chao Li,
Wenwen Ye,
Qingrong Huang,
Qinming Zhang,
Zhiguo Shi,
Maximilian Lederer,
Nellie Laleni,
Shan Deng,
Zijian Zhao,
Michael Niemier,
Xiaobo Sharon Hu,
Cheng Zhuo,
Thomas Kämpfe,
Kai Ni
Abstract:
Deep random forest (DRF), which incorporates the core features of deep learning and random forest (RF), exhibits comparable classification accuracy, interpretability, and low memory and computational overhead when compared with deep neural networks (DNNs) in various information processing tasks for edge intelligence. However, the development of efficient hardware to accelerate DRF is lagging behin…
▽ More
Deep random forest (DRF), which incorporates the core features of deep learning and random forest (RF), exhibits comparable classification accuracy, interpretability, and low memory and computational overhead when compared with deep neural networks (DNNs) in various information processing tasks for edge intelligence. However, the development of efficient hardware to accelerate DRF is lagging behind its DNN counterparts. The key for hardware acceleration of DRF lies in efficiently realizing the branch-split operation at decision nodes when traversing a decision tree. In this work, we propose to implement DRF through simple associative searches realized with ferroelectric analog content addressable memory (ACAM). Utilizing only two ferroelectric field effect transistors (FeFETs), the ultra-compact ACAM cell can perform a branch-split operation with an energy-efficient associative search by storing the decision boundaries as the analog polarization states in an FeFET. The DRF accelerator architecture and the corresponding map** of the DRF model to the ACAM arrays are presented. The functionality, characteristics, and scalability of the FeFET ACAM based DRF and its robustness against FeFET device non-idealities are validated both in experiments and simulations. Evaluation results show that the FeFET ACAM DRF accelerator exhibits 10^6x/16x and 10^6x/2.5x improvements in terms of energy and latency when compared with other deep random forest hardware implementations on the state-of-the-art CPU/ReRAM, respectively.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
Intrinsic synaptic plasticity of ferroelectric field effect transistors for online learning
Authors:
Arnob Saha,
A N M Nafiul Islam,
Zijian Zhao,
Shan Deng,
Kai Ni,
Abhronil Sengupta
Abstract:
Nanoelectronic devices emulating neuro-synaptic functionalities through their intrinsic physics at low operating energies is imperative toward the realization of brain-like neuromorphic computers. In this work, we leverage the non-linear voltage dependent partial polarization switching of a ferroelectric field effect transistor to mimic plasticity characteristics of biological synapses. We provide…
▽ More
Nanoelectronic devices emulating neuro-synaptic functionalities through their intrinsic physics at low operating energies is imperative toward the realization of brain-like neuromorphic computers. In this work, we leverage the non-linear voltage dependent partial polarization switching of a ferroelectric field effect transistor to mimic plasticity characteristics of biological synapses. We provide experimental measurements of the synaptic characteristics for a $28nm$ high-k metal gate technology based device and develop an experimentally calibrated device model for large-scale system performance prediction. Decoupled read-write paths, ultra-low programming energies and the possibility of arranging such devices in a cross-point architecture demonstrate the synaptic efficacy of the device. Our hardware-algorithm co-design analysis reveals that the intrinsic plasticity of the ferroelectric devices has potential to enable unsupervised local learning in edge devices with limited training data.
△ Less
Submitted 16 September, 2021; v1 submitted 27 July, 2021;
originally announced July 2021.
-
Application-driven Design Exploration for Dense Ferroelectric Embedded Non-volatile Memories
Authors:
Mohammad Mehdi Sharifi,
Lillian Pentecost,
Ramin Rajaei,
Arman Kazemi,
Qiuwen Lou,
Gu-Yeon Wei,
David Brooks,
Kai Ni,
X. Sharon Hu,
Michael Niemier,
Marco Donato
Abstract:
The memory wall bottleneck is a key challenge across many data-intensive applications. Multi-level FeFET-based embedded non-volatile memories are a promising solution for denser and more energy-efficient on-chip memory. However, reliable multi-level cell storage requires careful optimizations to minimize the design overhead costs. In this work, we investigate the interplay between FeFET device cha…
▽ More
The memory wall bottleneck is a key challenge across many data-intensive applications. Multi-level FeFET-based embedded non-volatile memories are a promising solution for denser and more energy-efficient on-chip memory. However, reliable multi-level cell storage requires careful optimizations to minimize the design overhead costs. In this work, we investigate the interplay between FeFET device characteristics, programming schemes, and memory array architecture, and explore different design choices to optimize performance, energy, area, and accuracy metrics for critical data-intensive workloads. From our cross-stack design exploration, we find that we can store DNN weights and social network graphs at a density of over 8MB/mm^2 and sub-2ns read access latency without loss in application accuracy.
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
Learning Improvised Chatbots from Adversarial Modifications of Natural Language Feedback
Authors:
Makesh Narsimhan Sreedhar,
Kun Ni,
Siva Reddy
Abstract:
The ubiquitous nature of chatbots and their interaction with users generate an enormous amount of data. Can we improve chatbots using this data? A self-feeding chatbot improves itself by asking natural language feedback when a user is dissatisfied with its response and uses this feedback as an additional training sample. However, user feedback in most cases contains extraneous sequences hindering…
▽ More
The ubiquitous nature of chatbots and their interaction with users generate an enormous amount of data. Can we improve chatbots using this data? A self-feeding chatbot improves itself by asking natural language feedback when a user is dissatisfied with its response and uses this feedback as an additional training sample. However, user feedback in most cases contains extraneous sequences hindering their usefulness as a training sample. In this work, we propose a generative adversarial model that converts noisy feedback into a plausible natural response in a conversation. The generator's goal is to convert the feedback into a response that answers the user's previous utterance and to fool the discriminator which distinguishes feedback from natural responses. We show that augmenting original training data with these modified feedback responses improves the original chatbot performance from 69.94% to 75.96% in ranking correct responses on the Personachat dataset, a large improvement given that the original model is already trained on 131k samples.
△ Less
Submitted 14 October, 2020; v1 submitted 14 October, 2020;
originally announced October 2020.
-
FeCAM: A Universal Compact Digital and Analog Content Addressable Memory Using Ferroelectric
Authors:
Xunzhao Yin,
Chao Li,
Qingrong Huang,
Li Zhang,
Michael Niemier,
Xiaobo Sharon Hu,
Cheng Zhuo,
Kai Ni
Abstract:
Ferroelectric field effect transistors (FeFETs) are being actively investigated with the potential for in-memory computing (IMC) over other non-volatile memories (NVMs). Content Addressable Memories (CAMs) are a form of IMC that performs parallel searches for matched entries over a memory array for a given input query. CAMs are widely used for data-centric applications that involve pattern matchin…
▽ More
Ferroelectric field effect transistors (FeFETs) are being actively investigated with the potential for in-memory computing (IMC) over other non-volatile memories (NVMs). Content Addressable Memories (CAMs) are a form of IMC that performs parallel searches for matched entries over a memory array for a given input query. CAMs are widely used for data-centric applications that involve pattern matching and search functionality. To accommodate the ever expanding data, it is attractive to resort to analog CAM for memory density improvement. However, the digital CAM design nowadays based on standard CMOS or emerging nonvolatile memories (e.g., resistive storage devices) is already challenging due to area, power, and cost penalties. Thus, it can be extremely expensive to achieve analog CAM with those technologies due to added cell components. As such, we propose, for the first time, a universal compact FeFET based CAM design, FeCAM, with search and storage functionality enabled in digital and analog domain simultaneously. By exploiting the multi-level-cell (MLC) states of FeFET, FeCAM can store and search inputs in either digital or analog domain. We perform a device-circuit co-design of the proposed FeCAM and validate its functionality and performance using an experimentally calibrated FeFET model. Circuit level simulation results demonstrate that FeCAM can either store continuous matching ranges or encode 3-bit data in a single CAM cell. When compared with the existing digital CMOS based CAM approaches, FeCAM is found to improve both memory density by 22.4X and energy saving by 8.6/3.2X for analog/digital modes, respectively. In the CAM-related application, our evaluations show that FeCAM can achieve 60.5X/23.1X saving in area/search energy compared with conventional CMOS based CAMs.
△ Less
Submitted 17 July, 2020; v1 submitted 4 April, 2020;
originally announced April 2020.
-
A Hybrid FeMFET-CMOS Analog Synapse Circuit for Neural Network Training and Inference
Authors:
Arman Kazemi,
Ramin Rajaei,
Kai Ni,
Suman Datta,
Michael Niemier,
X. Sharon Hu
Abstract:
An analog synapse circuit based on ferroelectric-metal field-effect transistors is proposed, that offers 6-bit weight precision. The circuit is comprised of volatile least significant bits (LSBs) used solely during training, and non-volatile most significant bits (MSBs) used for both training and inference. The design works at a 1.8V logic-compatible voltage, provides 10^10 endurance cycles, and r…
▽ More
An analog synapse circuit based on ferroelectric-metal field-effect transistors is proposed, that offers 6-bit weight precision. The circuit is comprised of volatile least significant bits (LSBs) used solely during training, and non-volatile most significant bits (MSBs) used for both training and inference. The design works at a 1.8V logic-compatible voltage, provides 10^10 endurance cycles, and requires only 250ps update pulses. A variant of LeNet trained with the proposed synapse achieves 98.2% accuracy on MNIST, which is only 0.4% lower than an ideal implementation of the same network with the same bit precision. Furthermore, the proposed synapse offers improvements of up to 26% in area, 44.8% in leakage power, 16.7% in LSB update pulse duration, and two orders of magnitude in endurance cycles, when compared to state-of-the-art hybrid synaptic circuits. Our proposed synapse can be extended to an 8-bit design, enabling a VGG-like network to achieve 88.8% accuracy on CIFAR-10 (only 0.8% lower than an ideal implementation of the same network).
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
SWAG: Item Recommendations using Convolutions on Weighted Graphs
Authors:
Amit Pande,
Kai Ni,
Venkataramani Kini
Abstract:
Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. In this work, we present a Graph Convolutional Network (GCN) algorithm SWAG (Sample Weight and AGgregate), which combines efficient random walks and graph convolutions on weighted graphs to generate embeddings for nodes (items) that incorporate both graph…
▽ More
Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. In this work, we present a Graph Convolutional Network (GCN) algorithm SWAG (Sample Weight and AGgregate), which combines efficient random walks and graph convolutions on weighted graphs to generate embeddings for nodes (items) that incorporate both graph structure as well as node feature information such as item-descriptions and item-images. The three important SWAG operations that enable us to efficiently generate node embeddings based on graph structures are (a) Sampling of graph to homogeneous structure, (b) Weighting the sampling, walks and convolution operations, and (c) using AGgregation functions for generating convolutions. The work is an adaptation of graphSAGE over weighted graphs. We deploy SWAG at Target and train it on a graph of more than 500K products sold online with over 50M edges. Offline and online evaluations reveal the benefit of using a graph-based approach and the benefits of weighing to produce high quality embeddings and product recommendations.
△ Less
Submitted 22 November, 2019;
originally announced November 2019.
-
The Determinants For User Intention To Adopt Web Based Early Childhood Supplementary Educational Platform
Authors:
Khoo Han Ni,
Ahmad Suhaimi Baharudin,
Kamal Karkonasasi
Abstract:
Education is defined as the fundamental key for the success and development for any given society. Hence, the forming of a strong basic educational foundation is crucial to ensure the children to stay competitive and achieve extraordinary in the changing world. I-zLink Sdn. Bhd. is a partnership company that formed by four individuals with the commitment to develop an early childhood learning solu…
▽ More
Education is defined as the fundamental key for the success and development for any given society. Hence, the forming of a strong basic educational foundation is crucial to ensure the children to stay competitive and achieve extraordinary in the changing world. I-zLink Sdn. Bhd. is a partnership company that formed by four individuals with the commitment to develop an early childhood learning solution named i-Future. The system, i-Future is a Supplementary Educational Platform that designs specially for children that fall between the age categories of two to six years which integrate with the latest technology to increase the engagement and flexibility in learning process as well as the effectiveness of study. In order to ensure that the system has a market value, an empirical study was conducted to determine the acceptance level and also the variables that would affect the user's intention to adopt i-Future. The variables used in the study include Perceived Ease of Use (PEU), Perceived Usefulness (PU), System Quality (SQ) and Social Norm (SN). Through the quantitative study, the relationship between these variables and user's intention to adopt i-Future was determined. Those variables which have significant impact on user's intention to adopt i-Future will be used to design i-Future.
△ Less
Submitted 26 June, 2018;
originally announced June 2018.
-
Deep Speech Denoising with Vector Space Projections
Authors:
Jeff Hetherly,
Paul Gamble,
Maria Barrios,
Cory Stephenson,
Karl Ni
Abstract:
We propose an algorithm to denoise speakers from a single microphone in the presence of non-stationary and dynamic noise. Our approach is inspired by the recent success of neural network models separating speakers from other speakers and singers from instrumental accompaniment. Unlike prior art, we leverage embedding spaces produced with source-contrastive estimation, a technique derived from nega…
▽ More
We propose an algorithm to denoise speakers from a single microphone in the presence of non-stationary and dynamic noise. Our approach is inspired by the recent success of neural network models separating speakers from other speakers and singers from instrumental accompaniment. Unlike prior art, we leverage embedding spaces produced with source-contrastive estimation, a technique derived from negative sampling techniques in natural language processing, while simultaneously obtaining a continuous inference mask. Our embedding space directly optimizes for the discrimination of speaker and noise by jointly modeling their characteristics. This space is generalizable in that it is not speaker or noise specific and is capable of denoising speech even if the model has not seen the speaker in the training set. Parameters are trained with dual objectives: one that promotes a selective bandpass filter that eliminates noise at time-frequency positions that exceed signal power, and another that proportionally splits time-frequency content between signal and noise. We compare to state of the art algorithms as well as traditional sparse non-negative matrix factorization solutions. The resulting algorithm avoids severe computational burden by providing a more intuitive and easily optimized approach, while achieving competitive accuracy.
△ Less
Submitted 27 April, 2018;
originally announced April 2018.
-
Voices Obscured in Complex Environmental Settings (VOICES) corpus
Authors:
Colleen Richey,
Maria A. Barrios,
Zeb Armstrong,
Chris Bartels,
Horacio Franco,
Martin Graciarena,
Aaron Lawson,
Mahesh Kumar Nandwana,
Allen Stauffer,
Julien van Hout,
Paul Gamble,
Jeff Hetherly,
Cory Stephenson,
Karl Ni
Abstract:
This paper introduces the Voices Obscured In Complex Environmental Settings (VOICES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions. Publicly available speech corpora are mostly composed of isolated speech at close-range microphony. A typical appro…
▽ More
This paper introduces the Voices Obscured In Complex Environmental Settings (VOICES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions. Publicly available speech corpora are mostly composed of isolated speech at close-range microphony. A typical approach to better represent realistic scenarios, is to convolve clean speech with noise and simulated room response for model training. Despite these efforts, model performance degrades when tested against uncurated speech in natural conditions. For this corpus, audio was recorded in furnished rooms with background noise played in conjunction with foreground speech selected from the LibriSpeech corpus. Multiple sessions were recorded in each room to accommodate for all foreground speech-background noise combinations. Audio was recorded using twelve microphones placed throughout the room, resulting in 120 hours of audio per microphone. This work is a multi-organizational effort led by SRI International and Lab41 with the intent to push forward state-of-the-art distant microphone approaches in signal processing and speech recognition.
△ Less
Submitted 15 May, 2018; v1 submitted 13 April, 2018;
originally announced April 2018.
-
PySEAL: A Python wrapper implementation of the SEAL homomorphic encryption library
Authors:
Alexander J. Titus,
Shashwat Kishore,
Todd Stavish,
Stephanie M. Rogers,
Karl Ni
Abstract:
Motivation: The ability to perform operations on encrypted data has a growing number of applications in bioinformatics, with implications for data privacy in health care and biosecurity. The SEAL library is a popular implementation of fully homomorphic encryption developed in C++ by Microsoft Research. Despite the advantages of C++, Python is a flexible and dominant programming language that enabl…
▽ More
Motivation: The ability to perform operations on encrypted data has a growing number of applications in bioinformatics, with implications for data privacy in health care and biosecurity. The SEAL library is a popular implementation of fully homomorphic encryption developed in C++ by Microsoft Research. Despite the advantages of C++, Python is a flexible and dominant programming language that enables rapid prototy** of bioinformatics pipelines.
Results: In an effort to make homomorphic encryption accessible to a broader range of bioinformatics scientists and applications, we present a Python binding implementation of the popular homomorphic encryption library, SEAL, using pybind11. The software contains a Docker image to facilitate easy installation and execution of the SEAL build process.
Availability: All code is publicly available at https://github.com/Lab41/PySEAL Contact: [email protected] Supplementary information: Supplementary information is available on the Lab41 GitHub.
△ Less
Submitted 5 March, 2018;
originally announced March 2018.
-
Audio Concept Classification with Hierarchical Deep Neural Networks
Authors:
Mirco Ravanelli,
Benjamin Elizalde,
Karl Ni,
Gerald Friedland
Abstract:
Audio-based multimedia retrieval tasks may identify semantic information in audio streams, i.e., audio concepts (such as music, laughter, or a revving engine). Conventional Gaussian-Mixture-Models have had some success in classifying a reduced set of audio concepts. However, multi-class classification can benefit from context window analysis and the discriminating power of deeper architectures. Al…
▽ More
Audio-based multimedia retrieval tasks may identify semantic information in audio streams, i.e., audio concepts (such as music, laughter, or a revving engine). Conventional Gaussian-Mixture-Models have had some success in classifying a reduced set of audio concepts. However, multi-class classification can benefit from context window analysis and the discriminating power of deeper architectures. Although deep learning has shown promise in various applications such as speech and object recognition, it has not yet met the expectations for other fields such as audio concept classification. This paper explores, for the first time, the potential of deep learning in classifying audio concepts on User-Generated Content videos. The proposed system is comprised of two cascaded neural networks in a hierarchical configuration to analyze the short- and long-term context information. Our system outperforms a GMM approach by a relative 54%, a Neural Network by 33%, and a Deep Neural Network by 12% on the TRECVID-MED database
△ Less
Submitted 11 October, 2017;
originally announced October 2017.
-
Learning to Explain Non-Standard English Words and Phrases
Authors:
Ke Ni,
William Yang Wang
Abstract:
We describe a data-driven approach for automatically explaining new, non-standard English expressions in a given sentence, building on a large dataset that includes 15 years of crowdsourced examples from UrbanDictionary.com. Unlike prior studies that focus on matching keywords from a slang dictionary, we investigate the possibility of learning a neural sequence-to-sequence model that generates exp…
▽ More
We describe a data-driven approach for automatically explaining new, non-standard English expressions in a given sentence, building on a large dataset that includes 15 years of crowdsourced examples from UrbanDictionary.com. Unlike prior studies that focus on matching keywords from a slang dictionary, we investigate the possibility of learning a neural sequence-to-sequence model that generates explanations of unseen non-standard English expressions given context. We propose a dual encoder approach---a word-level encoder learns the representation of context, and a second character-level encoder to learn the hidden representation of the target non-standard expression. Our model can produce reasonable definitions of new non-standard English expressions given their context with certain confidence.
△ Less
Submitted 26 September, 2017;
originally announced September 2017.
-
Monaural Audio Speaker Separation with Source Contrastive Estimation
Authors:
Cory Stephenson,
Patrick Callier,
Abhinav Ganesh,
Karl Ni
Abstract:
We propose an algorithm to separate simultaneously speaking persons from each other, the "cocktail party problem", using a single microphone. Our approach involves a deep recurrent neural networks regression to a vector space that is descriptive of independent speakers. Such a vector space can embed empirically determined speaker characteristics and is optimized by distinguishing between speaker m…
▽ More
We propose an algorithm to separate simultaneously speaking persons from each other, the "cocktail party problem", using a single microphone. Our approach involves a deep recurrent neural networks regression to a vector space that is descriptive of independent speakers. Such a vector space can embed empirically determined speaker characteristics and is optimized by distinguishing between speaker masks. We call this technique source-contrastive estimation. The methodology is inspired by negative sampling, which has seen success in natural language processing, where an embedding is learned by correlating and de-correlating a given input vector with output weights. Although the matrix determined by the output weights is dependent on a set of known speakers, we only use the input vectors during inference. Doing so will ensure that source separation is explicitly speaker-independent. Our approach is similar to recent deep neural network clustering and permutation-invariant training research; we use weighted spectral features and masks to augment individual speaker frequencies while filtering out other speakers. We avoid, however, the severe computational burden of other approaches with our technique. Furthermore, by training a vector space rather than combinations of different speakers or differences thereof, we avoid the so-called permutation problem during training. Our algorithm offers an intuitive, computationally efficient response to the cocktail party problem, and most importantly boasts better empirical performance than other current techniques.
△ Less
Submitted 12 May, 2017;
originally announced May 2017.
-
Sampled Image Tagging and Retrieval Methods on User Generated Content
Authors:
Karl Ni,
Kyle Zaragoza,
Charles Foster,
Carmen Carrano,
Barry Chen,
Yonas Tesfaye,
Alex Gude
Abstract:
Traditional image tagging and retrieval algorithms have limited value as a result of being trained with heavily curated datasets. These limitations are most evident when arbitrary search words are used that do not intersect with training set labels. Weak labels from user generated content (UGC) found in the wild (e.g., Google Photos, FlickR, etc.) have an almost unlimited number of unique words in…
▽ More
Traditional image tagging and retrieval algorithms have limited value as a result of being trained with heavily curated datasets. These limitations are most evident when arbitrary search words are used that do not intersect with training set labels. Weak labels from user generated content (UGC) found in the wild (e.g., Google Photos, FlickR, etc.) have an almost unlimited number of unique words in the metadata tags. Prior work on word embeddings successfully leveraged unstructured text with large vocabularies, and our proposed method seeks to apply similar cost functions to open source imagery. Specifically, we train a deep learning image tagging and retrieval system on large scale, user generated content (UGC) using sampling methods and joint optimization of word embeddings. By using the Yahoo! FlickR Creative Commons (YFCC100M) dataset, such an approach builds robustness to common unstructured data issues that include but are not limited to irrelevant tags, misspellings, multiple languages, polysemy, and tag imbalance. As a result, the final proposed algorithm will not only yield comparable results to state of the art in conventional image tagging, but will enable new capability to train algorithms on large, scale unstructured text in the YFCC100M dataset and outperform cited work in zero-shot capability.
△ Less
Submitted 2 December, 2016; v1 submitted 21 November, 2016;
originally announced November 2016.
-
Map** Images to Sentiment Adjective Noun Pairs with Factorized Neural Nets
Authors:
Takuya Narihira,
Damian Borth,
Stella X. Yu,
Karl Ni,
Trevor Darrell
Abstract:
We consider the visual sentiment task of map** an image to an adjective noun pair (ANP) such as "cute baby". To capture the two-factor structure of our ANP semantics as well as to overcome annotation noise and ambiguity, we propose a novel factorized CNN model which learns separate representations for adjectives and nouns but optimizes the classification performance over their product. Our exper…
▽ More
We consider the visual sentiment task of map** an image to an adjective noun pair (ANP) such as "cute baby". To capture the two-factor structure of our ANP semantics as well as to overcome annotation noise and ambiguity, we propose a novel factorized CNN model which learns separate representations for adjectives and nouns but optimizes the classification performance over their product. Our experiments on the publicly available SentiBank dataset show that our model significantly outperforms not only independent ANP classifiers on unseen ANPs and on retrieving images of novel ANPs, but also image captioning models which capture word semantics from co-occurrence of natural text; the latter turn out to be surprisingly poor at capturing the sentiment evoked by pure visual experience. That is, our factorized ANP CNN not only trains better from noisy labels, generalizes better to new images, but can also expands the ANP vocabulary on its own.
△ Less
Submitted 20 November, 2015;
originally announced November 2015.
-
YFCC100M: The New Data in Multimedia Research
Authors:
Bart Thomee,
David A. Shamma,
Gerald Friedland,
Benjamin Elizalde,
Karl Ni,
Douglas Poland,
Damian Borth,
Li-Jia Li
Abstract:
We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), the largest public multimedia collection that has ever been released. The dataset contains a total of 100 million media objects, of which approximately 99.2 million are photos and 0.8 million are videos, all of which carry a Creative Commons license. Each media object in the dataset is represented by several pieces of met…
▽ More
We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), the largest public multimedia collection that has ever been released. The dataset contains a total of 100 million media objects, of which approximately 99.2 million are photos and 0.8 million are videos, all of which carry a Creative Commons license. Each media object in the dataset is represented by several pieces of metadata, e.g. Flickr identifier, owner name, camera, title, tags, geo, media source. The collection provides a comprehensive snapshot of how photos and videos were taken, described, and shared over the years, from the inception of Flickr in 2004 until early 2014. In this article we explain the rationale behind its creation, as well as the implications the dataset has for science, research, engineering, and development. We further present several new challenges in multimedia research that can now be expanded upon with our dataset.
△ Less
Submitted 25 April, 2016; v1 submitted 5 March, 2015;
originally announced March 2015.
-
Large-Scale Deep Learning on the YFCC100M Dataset
Authors:
Karl Ni,
Roger Pearce,
Kofi Boakye,
Brian Van Essen,
Damian Borth,
Barry Chen,
Eric Wang
Abstract:
We present a work-in-progress snapshot of learning with a 15 billion parameter deep learning network on HPC architectures applied to the largest publicly available natural image and video dataset released to-date. Recent advancements in unsupervised deep neural networks suggest that scaling up such networks in both model and training dataset size can yield significant improvements in the learning…
▽ More
We present a work-in-progress snapshot of learning with a 15 billion parameter deep learning network on HPC architectures applied to the largest publicly available natural image and video dataset released to-date. Recent advancements in unsupervised deep neural networks suggest that scaling up such networks in both model and training dataset size can yield significant improvements in the learning of concepts at the highest layers. We train our three-layer deep neural network on the Yahoo! Flickr Creative Commons 100M dataset. The dataset comprises approximately 99.2 million images and 800,000 user-created videos from Yahoo's Flickr image and video sharing platform. Training of our network takes eight days on 98 GPU nodes at the High Performance Computing Center at Lawrence Livermore National Laboratory. Encouraging preliminary results and future research directions are presented and discussed.
△ Less
Submitted 11 February, 2015;
originally announced February 2015.