Skip to main content

Showing 1–15 of 15 results for author: Zhang, G L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14319  [pdf, other

    cs.AI cs.CL

    LiveMind: Low-latency Large Language Models with Simultaneous Inference

    Authors: Chuangtao Chen, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ulf Schlichtmann, Bing Li

    Abstract: In this paper, we introduce a novel low-latency inference framework for large language models (LLMs) inference which enables LLMs to perform inferences with incomplete prompts. By reallocating computational processes to prompt input phase, we achieve a substantial reduction in latency, thereby significantly enhancing the interactive experience for users of LLMs. The framework adeptly manages the v… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2402.18595  [pdf, other

    cs.AR cs.CE cs.LG

    EncodingNet: A Novel Encoding-based MAC Design for Efficient Neural Network Acceleration

    Authors: Bo Liu, Grace Li Zhang, Xunzhao Yin, Ulf Schlichtmann, Bing Li

    Abstract: Deep neural networks (DNNs) have achieved great breakthroughs in many fields such as image classification and natural language processing. However, the execution of DNNs needs to conduct massive numbers of multiply-accumulate (MAC) operations on hardware and thus incurs a large power consumption. To address this challenge, we propose a novel digital MAC design based on encoding. In this new design… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  3. arXiv:2312.05875  [pdf, other

    cs.AI

    Class-Aware Pruning for Efficient Neural Networks

    Authors: Mengnan Jiang, **gcun Wang, Amro Eldebiky, Xunzhao Yin, Cheng Zhuo, Ing-Chao Lin, Grace Li Zhang

    Abstract: Deep neural networks (DNNs) have demonstrated remarkable success in various fields. However, the large number of floating-point operations (FLOPs) in DNNs poses challenges for their deployment in resource-constrained applications, e.g., edge devices. To address the problem, pruning has been introduced to reduce the computational cost in executing DNNs. Previous pruning strategies are based on weig… ▽ More

    Submitted 18 February, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: Accepted by Design Automation and Test in Europe (DATE) 2024

  4. arXiv:2309.13443  [pdf, other

    cs.LG

    Early-Exit with Class Exclusion for Efficient Inference of Neural Networks

    Authors: **gcun Wang, Bing Li, Grace Li Zhang

    Abstract: Deep neural networks (DNNs) have been successfully applied in various fields. In DNNs, a large number of multiply-accumulate (MAC) operations are required to be performed, posing critical challenges in applying them in resource-constrained platforms, e.g., edge devices. To address this challenge, in this paper, we propose a class-based early-exit for dynamic inference. Instead of pushing DNNs to m… ▽ More

    Submitted 17 February, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

  5. arXiv:2309.10510  [pdf, other

    eess.SY cs.NE

    Logic Design of Neural Networks for High-Throughput and Low-Power Applications

    Authors: Kangwei Xu, Grace Li Zhang, Ulf Schlichtmann, Bing Li

    Abstract: Neural networks (NNs) have been successfully deployed in various fields. In NNs, a large number of multiplyaccumulate (MAC) operations need to be performed. Most existing digital hardware platforms rely on parallel MAC units to accelerate these MAC operations. However, under a given area constraint, the number of MAC units in such platforms is limited, so MAC units have to be reused to perform MAC… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: accepted by ASPDAC 2024

  6. arXiv:2306.07294  [pdf, other

    cs.LG cs.AI cs.NE

    Computational and Storage Efficient Quadratic Neurons for Deep Neural Networks

    Authors: Chuangtao Chen, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ulf Schlichtmann, Bing Li

    Abstract: Deep neural networks (DNNs) have been widely deployed across diverse domains such as computer vision and natural language processing. However, the impressive accomplishments of DNNs have been realized alongside extensive computational demands, thereby impeding their applicability on resource-constrained devices. To address this challenge, many researchers have been focusing on basic neuron structu… ▽ More

    Submitted 27 November, 2023; v1 submitted 10 June, 2023; originally announced June 2023.

    Comments: Accepted by Design Automation and Test in Europe (DATE) 2024

  7. arXiv:2303.13997  [pdf, other

    cs.NE cs.AI

    PowerPruning: Selecting Weights and Activations for Power-Efficient Neural Network Acceleration

    Authors: Richard Petri, Grace Li Zhang, Yiran Chen, Ulf Schlichtmann, Bing Li

    Abstract: Deep neural networks (DNNs) have been successfully applied in various fields. A major challenge of deploying DNNs, especially on edge devices, is power consumption, due to the large number of multiply-and-accumulate (MAC) operations. To address this challenge, we propose PowerPruning, a novel method to reduce power consumption in digital neural network accelerators by selecting weights that lead t… ▽ More

    Submitted 27 November, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: accepted by Design Automation Conference (DAC) 2023

  8. arXiv:2211.14928  [pdf, ps, other

    cs.LG

    Class-based Quantization for Neural Networks

    Authors: Wenhao Sun, Grace Li Zhang, Huaxi Gu, Bing Li, Ulf Schlichtmann

    Abstract: In deep neural networks (DNNs), there are a huge number of weights and multiply-and-accumulate (MAC) operations. Accordingly, it is challenging to apply DNNs on resource-constrained platforms, e.g., mobile phones. Quantization is a method to reduce the size and the computational complexity of DNNs. Existing quantization methods either require hardware overhead to achieve a non-uniform quantization… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: accepted by DATE2023 (Design, Automation and Test in Europe)

  9. arXiv:2211.14926  [pdf, other

    cs.LG

    Step**Net: A Step** Neural Network with Incremental Accuracy Enhancement

    Authors: Wenhao Sun, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Huaxi Gu, Bing Li, Ulf Schlichtmann

    Abstract: Deep neural networks (DNNs) have successfully been applied in many fields in the past decades. However, the increasing number of multiply-and-accumulate (MAC) operations in DNNs prevents their application in resource-constrained and resource-varying platforms, e.g., mobile phones and autonomous vehicles. In such platforms, neural networks need to provide acceptable results quickly and the accuracy… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: accepted by DATE2023 (Design, Automation and Test in Europe)

  10. arXiv:2211.14917  [pdf, other

    cs.AR cs.LG

    CorrectNet: Robustness Enhancement of Analog In-Memory Computing for Neural Networks by Error Suppression and Compensation

    Authors: Amro Eldebiky, Grace Li Zhang, Georg Boecherer, Bing Li, Ulf Schlichtmann

    Abstract: The last decade has witnessed the breakthrough of deep neural networks (DNNs) in many fields. With the increasing depth of DNNs, hundreds of millions of multiply-and-accumulate (MAC) operations need to be executed. To accelerate such operations efficiently, analog in-memory computing platforms based on emerging devices, e.g., resistive RAM (RRAM), have been introduced. These acceleration platforms… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: Accepted by DATE 2023 (Design, Automation and Test in Europe)

  11. arXiv:2203.05516  [pdf, other

    cs.AR

    VirtualSync+: Timing Optimization with Virtual Synchronization

    Authors: Grace Li Zhang, Bing Li, Xing Huang, Xunzhao Yin, Cheng Zhuo, Masanori Hashimoto, Ulf Schlichtmann

    Abstract: In digital circuit designs, sequential components such as flip-flops are used to synchronize signal propagations. Logic computations are aligned at and thus isolated by flip-flop stages. Although this fully synchronous style can reduce design efforts significantly, it may affect circuit performance negatively, because sequential components can only introduce delays into signal propagations but nev… ▽ More

    Submitted 10 March, 2022; originally announced March 2022.

  12. TimingCamouflage+: Netlist Security Enhancement with Unconventional Timing (with Appendix)

    Authors: Grace Li Zhang, Bing Li, Meng Li, Bei Yu, David Z. Pan, Michaela Brunner, Georg Sigl, Ulf Schlichtmann

    Abstract: With recent advances in reverse engineering, attackers can reconstruct a netlist to counterfeit chips by opening the die and scanning all layers of authentic chips. This relatively easy counterfeiting is made possible by the use of the standard simple clocking scheme, where all combinational blocks function within one clock period, so that a netlist of combinational logic gates and flip-flops is s… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

  13. PieceTimer: A Holistic Timing Analysis Framework Considering Setup/Hold Time Interdependency Using A Piecewise Model

    Authors: Grace Li Zhang, Bing Li, Ulf Schlichtmann

    Abstract: In static timing analysis, clock-to-q delays of flip-flops are considered as constants. Setup times and hold times are characterized separately and also used as constants. The characterized delays, setup times and hold times, are ap- plied in timing analysis independently to verify the perfor- mance of circuits. In reality, however, clock-to-q delays of flip-flops depend on both setup and hold tim… ▽ More

    Submitted 14 May, 2017; originally announced May 2017.

    Comments: IEEE/ACM International Conference on Computer-Aided Design (ICCAD), November 2016

  14. EffiTest: Efficient Delay Test and Statistical Prediction for Configuring Post-silicon Tunable Buffers

    Authors: Grace Li Zhang, Bing Li, Ulf Schlichtmann

    Abstract: At nanometer manufacturing technology nodes, process variations significantly affect circuit performance. To combat them, post- silicon clock tuning buffers can be deployed to balance timing bud- gets of critical paths for each individual chip after manufacturing. The challenge of this method is that path delays should be mea- sured for each chip to configure the tuning buffers properly. Current m… ▽ More

    Submitted 14 May, 2017; originally announced May 2017.

    Comments: ACM/IEEE Design Automation Conference (DAC), June 2016

  15. Sampling-based Buffer Insertion for Post-Silicon Yield Improvement under Process Variability

    Authors: Grace Li Zhang, Bing Li, Ulf Schlichtmann

    Abstract: At submicron manufacturing technology nodes process variations affect circuit performance significantly. This trend leads to a large timing margin and thus overdesign to maintain yield. To combat this pessimism, post-silicon clock tuning buffers can be inserted into circuits to balance timing budgets of critical paths with their neighbors. After manufacturing, these clock buffers can be configured… ▽ More

    Submitted 14 May, 2017; originally announced May 2017.

    Comments: Design, Automation and Test in Europe (DATE), 2016