Skip to main content

Showing 1–50 of 51 results for author: Hu, X S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06544  [pdf, other

    cs.AR cs.AI

    TSB: Tiny Shared Block for Efficient DNN Deployment on NVCIM Accelerators

    Authors: Yifan Qin, Zheyu Yan, Zixuan Pan, Wujie Wen, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Compute-in-memory (CIM) accelerators using non-volatile memory (NVM) devices offer promising solutions for energy-efficient and low-latency Deep Neural Network (DNN) inference execution. However, practical deployment is often hindered by the challenge of dealing with the massive amount of model weight parameters impacted by the inherent device variations within non-volatile computing-in-memory (NV… ▽ More

    Submitted 8 May, 2024; originally announced June 2024.

  2. arXiv:2403.03442  [pdf, other

    cs.AR

    CAMASim: A Comprehensive Simulation Framework for Content-Addressable Memory based Accelerators

    Authors: Mengyuan Li, Shiyi Liu, Mohammad Mehdi Sharifi, X. Sharon Hu

    Abstract: Content addressable memory (CAM) stands out as an efficient hardware solution for memory-intensive search operations by supporting parallel computation in memory. However, develo** a CAM-based accelerator architecture that achieves acceptable accuracy, while minimizing hardware cost and catering to both exact and approximate search, still presents a significant challenge especially when consider… ▽ More

    Submitted 7 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  3. arXiv:2402.15824  [pdf, other

    cs.CR cs.AR

    A New Secure Memory System for Efficient Data Protection and Access Pattern Obfuscation

    Authors: Haoran Geng, Yuezhi Che, Aaron Dingler, Michael Niemier, Xiaobo Sharon Hu

    Abstract: As the reliance on secure memory environments permeates across applications, memory encryption is used to ensure memory security. However, most effective encryption schemes, such as the widely used AES-CTR, inherently introduce extra overheads, including those associated with counter storage and version number integrity checks. Moreover, encryption only protects data content, and it does not fully… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  4. arXiv:2401.07378  [pdf, other

    cs.CV cs.AI

    Efficient approximation of Earth Mover's Distance Based on Nearest Neighbor Search

    Authors: Guangyu Meng, Ruyu Zhou, Liu Liu, Peixian Liang, Fang Liu, Danny Chen, Michael Niemier, X. Sharon Hu

    Abstract: Earth Mover's Distance (EMD) is an important similarity measure between two distributions, used in computer vision and many other application domains. However, its exact calculation is computationally and memory intensive, which hinders its scalability and applicability for large-scale problems. Various approximate EMD algorithms have been proposed to reduce computational costs, but they suffer lo… ▽ More

    Submitted 19 January, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

  5. arXiv:2401.05357  [pdf, other

    cs.AR cs.LG

    U-SWIM: Universal Selective Write-Verify for Computing-in-Memory Neural Accelerators

    Authors: Zheyu Yan, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Architectures that incorporate Computing-in-Memory (CiM) using emerging non-volatile memory (NVM) devices have become strong contenders for deep neural network (DNN) acceleration due to their impressive energy efficiency. Yet, a significant challenge arises when using these emerging devices: they can show substantial variations during the weight-map** process. This can severely impact DNN accura… ▽ More

    Submitted 11 December, 2023; originally announced January 2024.

  6. arXiv:2312.06137  [pdf, other

    cs.LG cs.AR

    Compute-in-Memory based Neural Network Accelerators for Safety-Critical Systems: Worst-Case Scenarios and Protections

    Authors: Zheyu Yan, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Emerging non-volatile memory (NVM)-based Computing-in-Memory (CiM) architectures show substantial promise in accelerating deep neural networks (DNNs) due to their exceptional energy efficiency. However, NVM devices are prone to device variations. Consequently, the actual DNN weights mapped to NVM devices can differ considerably from their targeted values, inducing significant performance degradati… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  7. arXiv:2311.17852  [pdf, other

    cs.AR

    A Computing-in-Memory-based One-Class Hyperdimensional Computing Model for Outlier Detection

    Authors: Ruixuan Wang, Sabrina Hassan Moon, Xiaobo Sharon Hu, Xun Jiao, Dayane Reis

    Abstract: In this work, we present ODHD, an algorithm for outlier detection based on hyperdimensional computing (HDC), a non-classical learning paradigm. Along with the HDC-based algorithm, we propose IM-ODHD, a computing-in-memory (CiM) implementation based on hardware/software (HW/SW) codesign for improved latency and energy efficiency. The training and testing phases of ODHD may be performed with convent… ▽ More

    Submitted 22 February, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

  8. arXiv:2310.04940  [pdf, other

    cs.AR

    SEE-MCAM: Scalable Multi-bit FeFET Content Addressable Memories for Energy Efficient Associative Search

    Authors: Shengxi Shou, Che-Kai Liu, Sanggeon Yun, Zishen Wan, Kai Ni, Mohsen Imani, X. Sharon Hu, Jianyi Yang, Cheng Zhuo, Xunzhao Yin

    Abstract: In this work, we propose SEE-MCAM, scalable and compact multi-bit CAM (MCAM) designs that utilize the three-terminal ferroelectric FET (FeFET) as the proxy. By exploiting the multi-level-cell characteristics of FeFETs, our proposed SEE-MCAM designs enable multi-bit associative search functions and achieve better energy efficiency and performance than existing FeFET-based CAM designs. We validated… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: Accepted by Internation Conference on Computer-Aided Design (ICCAD), 2023

  9. arXiv:2309.06418  [pdf, other

    cs.AR

    C4CAM: A Compiler for CAM-based In-memory Accelerators

    Authors: Hamid Farzaneh, João Paulo Cardoso de Lima, Mengyuan Li, Asif Ali Khan, Xiaobo Sharon Hu, Jeronimo Castrillon

    Abstract: Machine learning and data analytics applications increasingly suffer from the high latency and energy consumption of conventional von Neumann architectures. Recently, several in-memory and near-memory systems have been proposed to remove this von Neumann bottleneck. Platforms based on content-addressable memories (CAMs) are particularly interesting due to their efficient support for the search-bas… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: 10 pages, 9 figures

  10. arXiv:2308.02648  [pdf, other

    cs.CR cs.AR

    Privacy Preserving In-memory Computing Engine

    Authors: Haoran Geng, Jianqiao Mo, Dayane Reis, Jonathan Takeshita, Taeho Jung, Brandon Reagen, Michael Niemier, Xiaobo Sharon Hu

    Abstract: Privacy has rapidly become a major concern/design consideration. Homomorphic Encryption (HE) and Garbled Circuits (GC) are privacy-preserving techniques that support computations on encrypted data. HE and GC can complement each other, as HE is more efficient for linear operations, while GC is more effective for non-linear operations. Together, they enable complex computing tasks, such as machine l… ▽ More

    Submitted 10 August, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

  11. arXiv:2307.15853  [pdf, other

    cs.LG cs.ET

    Improving Realistic Worst-Case Performance of NVCiM DNN Accelerators through Training with Right-Censored Gaussian Noise

    Authors: Zheyu Yan, Yifan Qin, Wujie Wen, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Compute-in-Memory (CiM), built upon non-volatile memory (NVM) devices, is promising for accelerating deep neural networks (DNNs) owing to its in-situ data processing capability and superior energy efficiency. Unfortunately, the well-trained model parameters, after being mapped to NVM devices, can often exhibit large deviations from their intended values due to device variations, resulting in notab… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

  12. arXiv:2307.14557  [pdf, other

    cs.CR cs.AR

    Accelerating Polynomial Modular Multiplication with Crossbar-Based Compute-in-Memory

    Authors: Mengyuan Li, Haoran Geng, Michael Niemier, Xiaobo Sharon Hu

    Abstract: Lattice-based cryptographic algorithms built on ring learning with error theory are gaining importance due to their potential for providing post-quantum security. However, these algorithms involve complex polynomial operations, such as polynomial modular multiplication (PMM), which is the most time-consuming part of these algorithms. Accelerating PMM is crucial to make lattice-based cryptographic… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted by 42nd International Conference on Computer-Aided Design (ICCAD)

  13. arXiv:2306.06923  [pdf, other

    cs.LG cs.AR

    On the Viability of using LLMs for SW/HW Co-Design: An Example in Designing CiM DNN Accelerators

    Authors: Zheyu Yan, Yifan Qin, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Deep Neural Networks (DNNs) have demonstrated impressive performance across a wide range of tasks. However, deploying DNNs on edge devices poses significant challenges due to stringent power and computational budgets. An effective solution to this issue is software-hardware (SW-HW) co-design, which allows for the tailored creation of DNN models and hardware architectures that optimally utilize ava… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  14. arXiv:2305.14561  [pdf, other

    cs.LG cs.AI cs.AR

    Negative Feedback Training: A Novel Concept to Improve Robustness of NVCIM DNN Accelerators

    Authors: Yifan Qin, Zheyu Yan, Wujie Wen, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Compute-in-memory (CIM) accelerators built upon non-volatile memory (NVM) devices excel in energy efficiency and latency when performing Deep Neural Network (DNN) inference, thanks to their in-situ data processing capability. However, the stochastic nature and intrinsic variations of NVM devices often result in performance degradation in DNN inference. Introducing these non-ideal device behaviors… ▽ More

    Submitted 12 April, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

  15. arXiv:2304.03868  [pdf, other

    cs.ET

    Compact and High-Performance TCAM Based on Scaled Double-Gate FeFETs

    Authors: Liu Liu, Shubham Kumar, Simon Thomann, Yogesh Singh Chauhan, Hussam Amrouch, Xiaobo Sharon Hu

    Abstract: Ternary content addressable memory (TCAM), widely used in network routers and high-associativity caches, is gaining popularity in machine learning and data-analytic applications. Ferroelectric FETs (FeFETs) are a promising candidate for implementing TCAM owing to their high ON/OFF ratio, non-volatility, and CMOS compatibility. However, conventional single-gate FeFETs (SG-FeFETs) suffer from relati… ▽ More

    Submitted 13 April, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

    Comments: Accepted by Design Automation Conference (DAC) 2023

  16. arXiv:2212.00089  [pdf, other

    cs.AR cs.ET

    Ferroelectric FET based Context-Switching FPGA Enabling Dynamic Reconfiguration for Adaptive Deep Learning Machines

    Authors: Yixin Xu, Zijian Zhao, Yi Xiao, Tongguang Yu, Halid Mulaosmanovic, Dominik Kleimaier, Stefan Duenkel, Sven Beyer, Xiao Gong, Rajiv Joshi, X. Sharon Hu, Shixian Wen, Amanda Sofie Rios, Kiran Lekkala, Laurent Itti, Eric Homan, Sumitha George, Vijaykrishnan Narayanan, Kai Ni

    Abstract: Field Programmable Gate Array (FPGA) is widely used in acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the tradeoff between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. In this paper, we perfor… ▽ More

    Submitted 30 November, 2022; originally announced December 2022.

    Comments: 54 pages, 15 figures

  17. arXiv:2209.04161  [pdf, other

    cs.AR cs.AI cs.LG

    ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference

    Authors: **g Gong, Hassaan Saadat, Hasindu Gamaarachchi, Haris Javaid, Xiaobo Sharon Hu, Sri Parameswaran

    Abstract: Edge training of Deep Neural Networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness for gaining resource-efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource efficient acc… ▽ More

    Submitted 23 September, 2022; v1 submitted 9 September, 2022; originally announced September 2022.

    Comments: 14 pages, 12 figures

  18. arXiv:2209.01527  [pdf, other

    cs.CV

    Data-Driven Deep Supervision for Skin Lesion Classification

    Authors: Suraj Mishra, Yizhe Zhang, Li Zhang, Tianyu Zhang, X. Sharon Hu, Danny Z. Chen

    Abstract: Automatic classification of pigmented, non-pigmented, and depigmented non-melanocytic skin lesions have garnered lots of attention in recent years. However, imaging variations in skin texture, lesion shape, depigmentation contrast, lighting condition, etc. hinder robust feature extraction, affecting classification accuracy. In this paper, we propose a new deep neural network that exploits input da… ▽ More

    Submitted 3 September, 2022; originally announced September 2022.

    Comments: MICCAI 2022

  19. arXiv:2207.12188  [pdf, other

    cs.AR cs.ET

    COSIME: FeFET based Associative Memory for In-Memory Cosine Similarity Search

    Authors: Che-Kai Liu, Haobang Chen, Mohsen Imani, Kai Ni, Arman Kazemi, Ann Franchesca Laguna, Michael Niemier, Xiaobo Sharon Hu, Liang Zhao, Cheng Zhuo, Xunzhao Yin

    Abstract: In a number of machine learning models, an input query is searched across the trained class vectors to find the closest feature class vector in cosine similarity metric. However, performing the cosine similarities between the vectors in Von-Neumann machines involves a large number of multiplications, Euclidean normalizations and division operations, thus incurring heavy hardware energy and latency… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted by the 41st International Conference on Computer Aided Design (ICCAD), San Diego, USA

  20. arXiv:2207.07791  [pdf, other

    cs.AR cs.ET cs.LG

    Associative Memory Based Experience Replay for Deep Reinforcement Learning

    Authors: Mengyuan Li, Arman Kazemi, Ann Franchesca Laguna, X. Sharon Hu

    Abstract: Experience replay is an essential component in deep reinforcement learning (DRL), which stores the experiences and generates experiences for the agent to learn in real time. Recently, prioritized experience replay (PER) has been proven to be powerful and widely deployed in DRL agents. However, implementing PER on traditional CPU or GPU architectures incurs significant latency overhead due to its f… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: 9 pages, 9 figures. The work was accepted by the 41st International Conference on Computer-Aided Design (ICCAD), 2022, San Diego

  21. Computing-In-Memory Neural Network Accelerators for Safety-Critical Systems: Can Small Device Variations Be Disastrous?

    Authors: Zheyu Yan, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Computing-in-Memory (CiM) architectures based on emerging non-volatile memory (NVM) devices have demonstrated great potential for deep neural network (DNN) acceleration thanks to their high energy efficiency. However, NVM devices suffer from various non-idealities, especially device-to-device variations due to fabrication defects and cycle-to-cycle variations due to the stochastic behavior of devi… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

  22. arXiv:2205.13018  [pdf, other

    cs.AR

    On the Reliability of Computing-in-Memory Accelerators for Deep Neural Networks

    Authors: Zheyu Yan, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Computing-in-memory with emerging non-volatile memory (nvCiM) is shown to be a promising candidate for accelerating deep neural networks (DNNs) with high energy efficiency. However, most non-volatile memory (NVM) devices suffer from reliability issues, resulting in a difference between actual data involved in the nvCiM computation and the weight value trained in the data center. Thus, models actua… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: System Dependability And Analytics, 978-3-031-02062-9, Chapter 9

  23. arXiv:2204.07429  [pdf, other

    cs.ET cs.AR cs.LG cs.NE

    Experimentally realized memristive memory augmented neural network

    Authors: Ruibin Mao, Bo Wen, Yahui Zhao, Arman Kazemi, Ann Franchesca Laguna, Michael Neimier, X. Sharon Hu, Xia Sheng, Catherine E. Graves, John Paul Strachan, Can Li

    Abstract: Lifelong on-device learning is a key challenge for machine intelligence, and this requires learning from few, often single, samples. Memory augmented neural network has been proposed to achieve the goal, but the memory module has to be stored in an off-chip memory due to its size. Therefore the practical use has been heavily limited. Previous works on emerging memory-based implementation have diff… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

    Comments: 54 pages, 21 figures, 3 tables

  24. arXiv:2202.09433  [pdf, other

    cs.AR

    iMARS: An In-Memory-Computing Architecture for Recommendation Systems

    Authors: Mengyuan Li, Ann Franchesca Laguna, Dayane Reis, Xunzhao Yin, Michael Niemier, Xiaobo Sharon Hu

    Abstract: Recommendation systems (RecSys) suggest items to users by predicting their preferences based on historical data. Typical RecSys handle large embedding tables and many embedding table related operations. The memory size and bandwidth of the conventional computer architecture restrict the performance of RecSys. This work proposes an in-memory-computing (IMC) architecture (iMARS) for accelerating the… ▽ More

    Submitted 18 February, 2022; originally announced February 2022.

    Comments: Accepted by 59th Design Automation Conference (DAC)

  25. SWIM: Selective Write-Verify for Computing-in-Memory Neural Accelerators

    Authors: Zheyu Yan, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Computing-in-Memory architectures based on non-volatile emerging memories have demonstrated great potential for deep neural network (DNN) acceleration thanks to their high energy efficiency. However, these emerging devices can suffer from significant variations during the map** process i.e., programming weights to the devices), and if left undealt with, can cause significant accuracy degradation… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

  26. arXiv:2112.02231  [pdf, other

    cs.CR cs.AR cs.ET

    IMCRYPTO: An In-Memory Computing Fabric for AES Encryption and Decryption

    Authors: Dayane Reis, Haoran Geng, Michael Niemier, Xiaobo Sharon Hu

    Abstract: This paper proposes IMCRYPTO, an in-memory computing (IMC) fabric for accelerating AES encryption and decryption. IMCRYPTO employs a unified structure to implement encryption and decryption in a single hardware architecture, with combined (Inv)SubBytes and (Inv)MixColumns steps. Because of this step-combination, as well as the high parallelism achieved by multiple units of random-access memory (RA… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

  27. arXiv:2110.02495  [pdf, other

    cs.ET eess.SP

    Deep Random Forest with Ferroelectric Analog Content Addressable Memory

    Authors: Xunzhao Yin, Franz Müller, Ann Franchesca Laguna, Chao Li, Wenwen Ye, Qingrong Huang, Qinming Zhang, Zhiguo Shi, Maximilian Lederer, Nellie Laleni, Shan Deng, Zijian Zhao, Michael Niemier, Xiaobo Sharon Hu, Cheng Zhuo, Thomas Kämpfe, Kai Ni

    Abstract: Deep random forest (DRF), which incorporates the core features of deep learning and random forest (RF), exhibits comparable classification accuracy, interpretability, and low memory and computational overhead when compared with deep neural networks (DNNs) in various information processing tasks for edge intelligence. However, the development of efficient hardware to accelerate DRF is lagging behin… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: 44 pages, 16 figures

  28. arXiv:2109.05691  [pdf, other

    cs.LG

    RADARS: Memory Efficient Reinforcement Learning Aided Differentiable Neural Architecture Search

    Authors: Zheyu Yan, Weiwen Jiang, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Differentiable neural architecture search (DNAS) is known for its capacity in the automatic generation of superior neural networks. However, DNAS based methods suffer from memory usage explosion when the search space expands, which may prevent them from running successfully on even advanced GPU platforms. On the other hand, reinforcement learning (RL) based methods, while being memory efficient, a… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

  29. Uncertainty Modeling of Emerging Device-based Computing-in-Memory Neural Accelerators with Application to Neural Architecture Search

    Authors: Zheyu Yan, Da-Cheng Juan, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Emerging device-based Computing-in-memory (CiM) has been proved to be a promising candidate for high-energy efficiency deep neural network (DNN) computations. However, most emerging devices suffer uncertainty issues, resulting in a difference between actual data stored and the weight value it is designed to be. This leads to an accuracy drop from trained models to actually deployed platforms. In t… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

  30. arXiv:2107.02927  [pdf, other

    eess.IV cs.CV

    Image Complexity Guided Network Compression for Biomedical Image Segmentation

    Authors: Suraj Mishra, Danny Z. Chen, X. Sharon Hu

    Abstract: Compression is a standard procedure for making convolutional neural networks (CNNs) adhere to some specific computing resource constraints. However, searching for a compressed architecture typically involves a series of time-consuming training/validation experiments to determine a good compromise between network size and performance accuracy. To address this, we propose an image complexity-guided… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

    Comments: ACM JETC

  31. arXiv:2106.12029  [pdf, other

    cs.ET cs.AR

    MIMHD: Accurate and Efficient Hyperdimensional Inference Using Multi-Bit In-Memory Computing

    Authors: Arman Kazemi, Mohammad Mehdi Sharifi, Zhuowen Zou, Michael Niemier, X. Sharon Hu, Mohsen Imani

    Abstract: Hyperdimensional Computing (HDC) is an emerging computational framework that mimics important brain functions by operating over high-dimensional vectors, called hypervectors (HVs). In-memory computing implementations of HDC are desirable since they can significantly reduce data transfer overheads. All existing in-memory HDC platforms consider binary HVs where each dimension is represented with a s… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: Accepted at ISLPED 2021

  32. arXiv:2106.11757  [pdf, other

    cs.DC

    Application-driven Design Exploration for Dense Ferroelectric Embedded Non-volatile Memories

    Authors: Mohammad Mehdi Sharifi, Lillian Pentecost, Ramin Rajaei, Arman Kazemi, Qiuwen Lou, Gu-Yeon Wei, David Brooks, Kai Ni, X. Sharon Hu, Michael Niemier, Marco Donato

    Abstract: The memory wall bottleneck is a key challenge across many data-intensive applications. Multi-level FeFET-based embedded non-volatile memories are a promising solution for denser and more energy-efficient on-chip memory. However, reliable multi-level cell storage requires careful optimizations to minimize the design overhead costs. In this work, we investigate the interplay between FeFET device cha… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted at ISLPED 2021

  33. arXiv:2104.08554  [pdf, other

    cs.CV

    Objective-Dependent Uncertainty Driven Retinal Vessel Segmentation

    Authors: Suraj Mishra, Danny Z. Chen, X. Sharon Hu

    Abstract: From diagnosing neovascular diseases to detecting white matter lesions, accurate tiny vessel segmentation in fundus images is critical. Promising results for accurate vessel segmentation have been known. However, their effectiveness in segmenting tiny vessels is still limited. In this paper, we study retinal vessel segmentation by incorporating tiny vessel segmentation into our framework for the o… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

    Comments: ISBI 2021

  34. arXiv:2011.07095  [pdf, other

    cs.ET cs.LG

    In-Memory Nearest Neighbor Search with FeFET Multi-Bit Content-Addressable Memories

    Authors: Arman Kazemi, Mohammad Mehdi Sharifi, Ann Franchesca Laguna, Franz Müller, Ramin Rajaei, Ricardo Olivo, Thomas Kämpfe, Michael Niemier, X. Sharon Hu

    Abstract: Nearest neighbor (NN) search is an essential operation in many applications, such as one/few-shot learning and image classification. As such, fast and low-energy hardware support for accurate NN search is highly desirable. Ternary content-addressable memories (TCAMs) have been proposed to accelerate NN search for few-shot learning tasks by implementing $L_\infty$ and Hamming distance metrics, but… ▽ More

    Submitted 13 November, 2020; originally announced November 2020.

    Comments: To be published in DATE'21

  35. arXiv:2006.03178  [pdf, other

    cs.DC cs.GT cs.NI

    Towards Privacy-aware Task Allocation in Social Sensing based Edge Computing Systems

    Authors: Daniel Zhang, Yue Ma, X. Sharon Hu, Dong Wang

    Abstract: With the advance in mobile computing, Internet of Things, and ubiquitous wireless connectivity, social sensing based edge computing (SSEC) has emerged as a new computation paradigm where people and their personally owned devices collect sensor measurements from the physical world and process them at the edge of the network. This paper focuses on a privacy-aware task allocation problem where the go… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

  36. Computing-in-Memory for Performance and Energy Efficient Homomorphic Encryption

    Authors: Dayane Reis, Jonathan Takeshita, Taeho Jung, Michael Niemier, Xiaobo Sharon Hu

    Abstract: Homomorphic encryption (HE) allows direct computations on encrypted data. Despite numerous research efforts, the practicality of HE schemes remains to be demonstrated. In this regard, the enormous size of ciphertexts involved in HE computations degrades computational efficiency. Near-memory Processing (NMP) and Computing-in-memory (CiM) - paradigms where computation is done within the memory bound… ▽ More

    Submitted 19 August, 2020; v1 submitted 5 May, 2020; originally announced May 2020.

    Comments: 14 pages

    Journal ref: IEEE Transactions on Very Large Scale Integration (VLSI) Systems ( Volume: 28, Issue: 11, Nov. 2020)

  37. arXiv:2004.06094  [pdf, other

    cs.ET eess.SP

    A Device Non-Ideality Resilient Approach for Map** Neural Networks to Crossbar Arrays

    Authors: Arman Kazemi, Cristobal Alessandri, Alan C. Seabaugh, X. Sharon Hu, Michael Niemier, Siddharth Joshi

    Abstract: We propose a technology-independent method, referred to as adjacent connection matrix (ACM), to efficiently map signed weight matrices to non-negative crossbar arrays. When compared to same-hardware-overhead map** methods, using ACM leads to improvements of up to 20% in training accuracy for ResNet-20 with the CIFAR-10 dataset when training with 5-bit precision crossbar arrays or lower. When com… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: Accepted at DAC'20

  38. FeCAM: A Universal Compact Digital and Analog Content Addressable Memory Using Ferroelectric

    Authors: Xunzhao Yin, Chao Li, Qingrong Huang, Li Zhang, Michael Niemier, Xiaobo Sharon Hu, Cheng Zhuo, Kai Ni

    Abstract: Ferroelectric field effect transistors (FeFETs) are being actively investigated with the potential for in-memory computing (IMC) over other non-volatile memories (NVMs). Content Addressable Memories (CAMs) are a form of IMC that performs parallel searches for matched entries over a memory array for a given input query. CAMs are widely used for data-centric applications that involve pattern matchin… ▽ More

    Submitted 17 July, 2020; v1 submitted 4 April, 2020; originally announced April 2020.

    Comments: 8 pages, 8 figures, accepted

    Journal ref: IEEE Transactions on Electron Devices, 2020

  39. arXiv:2004.00703  [pdf, other

    cs.ET

    A Hybrid FeMFET-CMOS Analog Synapse Circuit for Neural Network Training and Inference

    Authors: Arman Kazemi, Ramin Rajaei, Kai Ni, Suman Datta, Michael Niemier, X. Sharon Hu

    Abstract: An analog synapse circuit based on ferroelectric-metal field-effect transistors is proposed, that offers 6-bit weight precision. The circuit is comprised of volatile least significant bits (LSBs) used solely during training, and non-volatile most significant bits (MSBs) used for both training and inference. The design works at a 1.8V logic-compatible voltage, provides 10^10 endurance cycles, and r… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: Accepted at ISCAS'20 for oral presentation

  40. arXiv:1911.00139  [pdf, ps, other

    cs.NE cs.LG

    Device-Circuit-Architecture Co-Exploration for Computing-in-Memory Neural Accelerators

    Authors: Weiwen Jiang, Qiuwen Lou, Zheyu Yan, Lei Yang, **gtong Hu, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Co-exploration of neural architectures and hardware design is promising to simultaneously optimize network accuracy and hardware efficiency. However, state-of-the-art neural architecture search algorithms for the co-exploration are dedicated for the conventional von-neumann computing architecture, whose performance is heavily limited by the well-known memory wall. In this paper, we are the first t… ▽ More

    Submitted 20 March, 2020; v1 submitted 31 October, 2019; originally announced November 2019.

    Comments: 10 pages, 6 figures

  41. arXiv:1905.12679  [pdf, other

    cs.ET cs.NE

    Nonvolatile Spintronic Memory Cells for Neural Networks

    Authors: Andrew W. Stephan, Qiuwen Lou, Michael Niemier, X. Sharon Hu, Steven J. Koester

    Abstract: A new spintronic nonvolatile memory cell analogous to 1T DRAM with non-destructive read is proposed. The cells can be used as neural computing units. A dual-circuit neural network architecture is proposed to leverage these devices against the complex operations involved in convolutional networks. Simulations based on HSPICE and Matlab were performed to study the performance of this architecture wh… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  42. arXiv:1903.06649  [pdf, other

    cs.ET cs.CV cs.DC

    Application-level Studies of Cellular Neural Network-based Hardware Accelerators

    Authors: Qiuwen Lou, Indranil Palit, Tang Li, Andras Horvath, Michael Niemier, X. Sharon Hu

    Abstract: As cost and performance benefits associated with Moore's Law scaling slow, researchers are studying alternative architectures (e.g., based on analog and/or spiking circuits) and/or computational models (e.g., convolutional and recurrent neural networks) to perform application-level tasks faster, more energy efficiently, and/or more accurately. We investigate cellular neural network (CeNN)-based co… ▽ More

    Submitted 12 June, 2019; v1 submitted 28 February, 2019; originally announced March 2019.

  43. arXiv:1902.02023  [pdf, other

    cs.NI

    Fully Distributed Packet Scheduling Framework for Handling Disturbances in Lossy Real-Time Wireless Networks

    Authors: Tianyu Zhang, Tao Gong, Song Han, Qingxu Deng, Xiaobo Sharon Hu

    Abstract: Along with the rapid growth of Industrial Internet-of-Things (IIoT) applications and their penetration into many industry sectors, real-time wireless networks (RTWNs) have been playing a more critical role in providing real-time, reliable and secure communication services for such applications. A key challenge in RTWN management is how to ensure real-time Quality of Services (QoS) especially in th… ▽ More

    Submitted 5 February, 2019; originally announced February 2019.

  44. Eva-CiM: A System-Level Performance and Energy Evaluation Framework for Computing-in-Memory Architectures

    Authors: Di Gao, Dayane Reis, Xiaobo Sharon Hu, Cheng Zhuo

    Abstract: Computing-in-Memory (CiM) architectures aim to reduce costly data transfers by performing arithmetic and logic operations in memory and hence relieve the pressure due to the memory wall. However, determining whether a given workload can really benefit from CiM, which memory hierarchy and what device technology should be adopted by a CiM architecture requires in-depth study that is not only time co… ▽ More

    Submitted 15 January, 2020; v1 submitted 27 January, 2019; originally announced January 2019.

    Comments: 13 pages, 16 figures

  45. arXiv:1901.01578  [pdf, other

    cs.CV

    CC-Net: Image Complexity Guided Network Compression for Biomedical Image Segmentation

    Authors: Suraj Mishra, Peixian Liang, Adam Czajka, Danny Z. Chen, X. Sharon Hu

    Abstract: Convolutional neural networks (CNNs) for biomedical image analysis are often of very large size, resulting in high memory requirement and high latency of operations. Searching for an acceptable compressed representation of the base CNN for a specific imaging application typically involves a series of time-consuming training/validation experiments to achieve a good compromise between network size a… ▽ More

    Submitted 8 September, 2019; v1 submitted 6 January, 2019; originally announced January 2019.

    Comments: Updated FM energy dist. figure

  46. arXiv:1812.11027  [pdf, other

    cs.LG stat.ML

    Exploring Weight Symmetry in Deep Neural Networks

    Authors: Xu Shell Hu, Sergey Zagoruyko, Nikos Komodakis

    Abstract: We propose to impose symmetry in neural network parameters to improve parameter usage and make use of dedicated convolution and matrix multiplication routines. Due to significant reduction in the number of parameters as a result of the symmetry constraints, one would expect a dramatic drop in accuracy. Surprisingly, we show that this is not the case, and, depending on network size, symmetry can ha… ▽ More

    Submitted 10 January, 2019; v1 submitted 28 December, 2018; originally announced December 2018.

  47. arXiv:1811.02636  [pdf, other

    cs.CV

    A mixed signal architecture for convolutional neural networks

    Authors: Qiuwen Lou, Chenyun Pan, John McGuiness, Andras Horvath, Azad Naeemi, Michael Niemier, X. Sharon Hu

    Abstract: Deep neural network (DNN) accelerators with improved energy and delay are desirable for meeting the requirements of hardware targeted for IoT and edge computing systems. Convolutional neural networks (CoNNs) belong to one of the most popular types of DNN architectures. This paper presents the design and evaluation of an accelerator for CoNNs. The system-level architecture is based on mixed-signal,… ▽ More

    Submitted 2 May, 2019; v1 submitted 30 October, 2018; originally announced November 2018.

    Comments: 25 pages

  48. arXiv:1809.00110  [pdf, other

    cs.CV

    DAC-SDC Low Power Object Detection Challenge for UAV Applications

    Authors: Xiaowei Xu, Xinyi Zhang, Bei Yu, X. Sharon Hu, Christopher Rowen, **gtong Hu, Yiyu Shi

    Abstract: The 55th Design Automation Conference (DAC) held its first System Design Contest (SDC) in 2018. SDC'18 features a lower power object detection challenge (LPODC) on designing and implementing novel algorithms based object detection in images taken from unmanned aerial vehicles (UAV). The dataset includes 95 categories and 150k images, and the hardware platforms include Nvidia's TX2 and Xilinx's PYN… ▽ More

    Submitted 31 August, 2018; originally announced September 2018.

    Comments: 12 pages, 21 figures

  49. arXiv:1705.10591  [pdf, other

    cs.DC cs.LG

    Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

    Authors: Xiaoming Chen, Jianxu Chen, Danny Z. Chen, Xiaobo Sharon Hu

    Abstract: Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even higher demand on fast convolution. The high computation throughput and memory bandwidth of graphics processing units (GPUs) make GPUs a natural choice for accelerati… ▽ More

    Submitted 29 May, 2017; originally announced May 2017.

  50. arXiv:1606.07467  [pdf, other

    cs.ET cs.CC physics.ins-det

    Efficient Analog Circuits for Boolean Satisfiability

    Authors: Xunzhao Yin, Behnam Sedighi, Melinda Varga, Maria Ercsey-Ravasz, Zoltan Toroczkai, Xiaobo Sharon Hu

    Abstract: Efficient solutions to NP-complete problems would significantly benefit both science and industry. However, such problems are intractable on digital computers based on the von Neumann architecture, thus creating the need for alternative solutions to tackle such problems. Recently, a deterministic, continuous-time dynamical system (CTDS) was proposed (Nat.Phys. {\bf 7}(12), 966 (2011)) to solve a r… ▽ More

    Submitted 11 February, 2018; v1 submitted 22 June, 2016; originally announced June 2016.

    Comments: 9 pages, 9 Figures, 1 Table. Added journal info in version 2: IEEE Transactions on Very Large Scale Integration Systems (TVLSI) vol 26, No 1, January 2018, pp 155-167. DOI: 10.1109/TVLSI.2017.2754192

    ACM Class: C.1.3; F.1.1; G.1.7; F.2.3