Skip to main content

Showing 1–26 of 26 results for author: Parhi, K K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06879  [pdf, other

    eess.SP cs.NE

    SpikePipe: Accelerated Training of Spiking Neural Networks via Inter-Layer Pipelining and Multiprocessor Scheduling

    Authors: Sai Sanjeet, Bibhu Datta Sahoo, Keshab K. Parhi

    Abstract: Spiking Neural Networks (SNNs) have gained popularity due to their high energy efficiency. Prior works have proposed various methods for training SNNs, including backpropagation-based methods. Training SNNs is computationally expensive compared to their conventional counterparts and would benefit from multiprocessor hardware acceleration. This is the first paper to propose inter-layer pipelining t… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  2. Robust Clustering using Hyperdimensional Computing

    Authors: Lulu Ge, Keshab K. Parhi

    Abstract: This paper addresses the clustering of data in the hyperdimensional computing (HDC) domain. In prior work, an HDC-based clustering framework, referred to as HDCluster, has been proposed. However, the performance of the existing HDCluster is not robust. The performance of HDCluster is degraded as the hypervectors for the clusters are chosen at random during the initialization step. To overcome this… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Journal ref: IEEE Open Journal of Circuits and Systems, Vol. 5, pp. 102-116, 2024

  3. KyberMat: Efficient Accelerator for Matrix-Vector Polynomial Multiplication in CRYSTALS-Kyber Scheme via NTT and Polyphase Decomposition

    Authors: Weihang Tan, Yingjie Lao, Keshab K. Parhi

    Abstract: CRYSTAL-Kyber (Kyber) is one of the post-quantum cryptography (PQC) key-encapsulation mechanism (KEM) schemes selected during the standardization process. This paper addresses optimization for Kyber architecture with respect to latency and throughput constraints. Specifically, matrix-vector multiplication and number theoretic transform (NTT)-based polynomial multiplication are critical operations… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: Proc. 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), San Francisco, CA, Oct. 29 - Nov. 2, 2023

    Journal ref: 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)

  4. A Low-Latency FFT-IFFT Cascade Architecture

    Authors: Keshab K. Parhi

    Abstract: This paper addresses the design of a partly-parallel cascaded FFT-IFFT architecture that does not require any intermediate buffer. Folding can be used to design partly-parallel architectures for FFT and IFFT. While many cascaded FFT-IFFT architectures can be designed using various folding sets for the FFT and the IFFT, for a specified folded FFT architecture, there exists a unique folding set to d… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Journal ref: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, April 2024

  5. Long Polynomial Modular Multiplication using Low-Complexity Number Theoretic Transform

    Authors: Sin-Wei Chiu, Keshab K. Parhi

    Abstract: This tutorial aims to establish connections between polynomial modular multiplication over a ring to circular convolution and discrete Fourier transform (DFT). The main goal is to extend the well-known theory of DFT in signal processing (SP) to other applications involving polynomials in a ring such as homomorphic encryption (HE). HE allows any third party to operate on the encrypted data without… ▽ More

    Submitted 22 December, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: 10 pages

    Journal ref: IEEE Signal Processing Magazine, 41(1), pp. 92-102, Jan. 2024

  6. Tensor Decomposition for Model Reduction in Neural Networks: A Review

    Authors: Xingyi Liu, Keshab K. Parhi

    Abstract: Modern neural networks have revolutionized the fields of computer vision (CV) and Natural Language Processing (NLP). They are widely used for solving complex CV tasks and NLP tasks such as image classification, image generation, and machine translation. Most state-of-the-art neural networks are over-parameterized and require a high computational cost. One straightforward solution is to replace the… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: IEEE Circuits and Systems Magazine, 2023

    Journal ref: IEEE Circuits and Systems Magazine, pp. 8-28, Second Quarter, 2023

  7. SCV-GNN: Sparse Compressed Vector-based Graph Neural Network Aggregation

    Authors: Nanda K. Unnikrishnan, Joe Gould, Keshab K. Parhi

    Abstract: Graph neural networks (GNNs) have emerged as a powerful tool to process graph-based data in fields like communication networks, molecular interactions, chemistry, social networks, and neuroscience. GNNs are characterized by the ultra-sparse nature of their adjacency matrix that necessitates the development of dedicated hardware beyond general-purpose sparse matrix multipliers. While there has been… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Journal ref: IEEE Transactions on Computer Aided Design (TCAD), 2023

  8. PaReNTT: Low-Latency Parallel Residue Number System and NTT-Based Long Polynomial Modular Multiplication for Homomorphic Encryption

    Authors: Weihang Tan, Sin-Wei Chiu, Antian Wang, Yingjie Lao, Keshab K. Parhi

    Abstract: High-speed long polynomial multiplication is important for applications in homomorphic encryption (HE) and lattice-based cryptosystems. This paper addresses low-latency hardware architectures for long polynomial modular multiplication using the number-theoretic transform (NTT) and inverse NTT (iNTT). Chinese remainder theorem (CRT) is used to decompose the modulus into multiple smaller moduli. Our… ▽ More

    Submitted 6 July, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Journal ref: IEEE Transactions on Information Forensics and Security, Vol. 19, pp. 1646-1659, 2024

  9. arXiv:2208.14270  [pdf, other

    cs.CR cs.AR

    Integral Sampler and Polynomial Multiplication Architecture for Lattice-based Cryptography

    Authors: Antian Wang, Weihang Tan, Keshab K. Parhi, Yingjie Lao

    Abstract: With the surge of the powerful quantum computer, lattice-based cryptography proliferated the latest cryptography hardware implementation due to its resistance against quantum computers. Among the computational blocks of lattice-based cryptography, the random errors produced by the sampler play a key role in ensuring the security of these schemes. This paper proposes an integral architecture for th… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

    Comments: 6 pages, accepted by 35th IEEE Int. Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems

  10. Multi-Channel FFT Architectures Designed via Folding and Interleaving

    Authors: Nanda K. Unnikrishnan, Keshab K. Parhi

    Abstract: Computing the FFT of a single channel is well understood in the literature. However, computing the FFT of multiple channels in a systematic manner has not been fully addressed. This paper presents a framework to design a family of multi-channel FFT architectures using {\em folding} and {\em interleaving}. Three distinct multi-channel FFT architectures are presented in this paper. These architectur… ▽ More

    Submitted 19 February, 2022; originally announced February 2022.

    Comments: Proc. 2022 IEEE International Symposium on Circuits and Systems (ISCAS)

    Journal ref: Proc. 2022 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 142-146

  11. High-Speed VLSI Architectures for Modular Polynomial Multiplication via Fast Filtering and Applications to Lattice-Based Cryptography

    Authors: Weihang Tan, Antian Wang, Yingjie Lao, Xinmiao Zhang, Keshab K. Parhi

    Abstract: This paper presents a low-latency hardware accelerator for modular polynomial multiplication for lattice-based post-quantum cryptography and homomorphic encryption applications. The proposed novel modular polynomial multiplier exploits the fast finite impulse response (FIR) filter architecture to reduce the computational complexity of the schoolbook modular polynomial multiplication. We also exten… ▽ More

    Submitted 24 February, 2023; v1 submitted 22 October, 2021; originally announced October 2021.

    Journal ref: IEEE Trans. on Computers, 72(9), pp. 2454-2466, Sept. 2023

  12. arXiv:2108.06629  [pdf, other

    cs.DC cs.AR cs.LG eess.SP

    LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling

    Authors: Nanda K. Unnikrishnan, Keshab K. Parhi

    Abstract: The time required for training the neural networks increases with size, complexity, and depth. Training model parameters by backpropagation inherently creates feedback loops. These loops hinder efficient pipelining and scheduling of the tasks within the layer and between consecutive layers. Prior approaches, such as PipeDream, have exploited the use of delayed gradient to achieve inter-layer pipel… ▽ More

    Submitted 14 August, 2021; originally announced August 2021.

    Comments: Proc. of the 2021 IEEE International Conference on Computer Aided Design (ICCAD)

    Journal ref: 2021 IEEE/ACM Conference on Computer Aided Design (ICCAD)

  13. Molecular MUX-Based Physical Unclonable Functions

    Authors: Lulu Ge, Keshab K. Parhi

    Abstract: Physical unclonable functions (PUFs) are small circuits that are widely used as hardware security primitives for authentication. These circuits can generate unique signatures because of the inherent randomness in manufacturing and process variations. This paper introduces molecular PUFs based on multiplexer (MUX) PUFs using dual-rail representation. It may be noted that molecular PUFs have not bee… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: Proc. of 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), July 2020

  14. arXiv:2004.11204  [pdf, other

    cs.LG cs.AI cs.CL cs.NE eess.SP

    Classification using Hyperdimensional Computing: A Review

    Authors: Lulu Ge, Keshab K. Parhi

    Abstract: Hyperdimensional (HD) computing is built upon its unique data type referred to as hypervectors. The dimension of these hypervectors is typically in the range of tens of thousands. Proposed to solve cognitive tasks, HD computing aims at calculating similarity among its data. Data transformation is realized by three operations, including addition, multiplication and permutation. Its ultra-wide data… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

    Comments: IEEE Circuits and Systems Magazine (2020)

    Journal ref: IEEE Circuits and Systems Magazine, 20(2), pp. 30-47, June 2020

  15. arXiv:2004.10936  [pdf, other

    cs.CV cs.AR

    PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices

    Authors: Chunhua Deng, Siyu Liao, Yi Xie, Keshab K. Parhi, Xuehai Qian, Bo Yuan

    Abstract: Deep neural network (DNN) has emerged as the most important and popular artificial intelligent (AI) technique. The growth of model size poses a key energy efficiency challenge for the underlying computing platform. Thus, model compression becomes a crucial problem. However, the current approaches are limited by various drawbacks. Specifically, network sparsification approach suffers from irregular… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

  16. Training DNA Perceptrons via Fractional Coding

    Authors: Xingyi Liu, Keshab K. Parhi

    Abstract: This paper describes a novel approach to synthesize molecular reactions to train a perceptron, i.e., a single-layered neural network, with sigmoidal activation function. The approach is based on fractional coding where a variable is represented by two molecules. The synergy between fractional coding in molecular computing and stochastic logic implementations in electronic computing is key to trans… ▽ More

    Submitted 8 January, 2020; v1 submitted 16 November, 2019; originally announced November 2019.

    Comments: Proc. 2019 Asilomar Conference on Signals, Systems and Computers

  17. Molecular and DNA Artificial Neural Networks via Fractional Coding

    Authors: Xingyi Liu, Keshab K. Parhi

    Abstract: This paper considers implementation of artificial neural networks (ANNs) using molecular computing and DNA based on fractional coding. Prior work had addressed molecular two-layer ANNs with binary inputs and arbitrary weights. In prior work using fractional coding, a simple molecular perceptron that computes sigmoid of scaled weighted sum of the inputs was presented where the inputs and the weight… ▽ More

    Submitted 7 March, 2020; v1 submitted 12 October, 2019; originally announced October 2019.

    Comments: IEEE Transactions on Biomedical Circuits and Systems, 2020

  18. arXiv:1610.07560  [pdf, other

    cs.CV

    Automated OCT Segmentation for Images with DME

    Authors: Sohini Roychowdhury, Dara D. Koozekanani, Michael Reinsbach, Keshab K. Parhi

    Abstract: This paper presents a novel automated system that segments six sub-retinal layers from optical coherence tomography (OCT) image stacks of healthy patients and patients with diabetic macular edema (DME). First, each image in the OCT stack is denoised using a Wiener deconvolution algorithm that estimates the additive speckle noise variance using a novel Fourier-domain based structural error. This de… ▽ More

    Submitted 24 October, 2016; originally announced October 2016.

    Comments: 31 pages, 7 figures, CRC Press Book Chapter, 2016

  19. LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision

    Authors: Bo Yuan, Keshab K. Parhi

    Abstract: Due to their capacity-achieving property, polar codes have become one of the most attractive channel codes. To date, the successive cancellation list (SCL) decoding algorithm is the primary approach that can guarantee outstanding error-correcting performance of polar codes. However, the hardware designs of the original SCL decoder have large silicon area and long decoding latency. Although some re… ▽ More

    Submitted 22 March, 2016; originally announced March 2016.

    Comments: accepted by IEEE Trans. Circuits and Systems II

  20. arXiv:1501.03235  [pdf

    cs.IT

    Successive Cancellation Decoding of Polar Codes using Stochastic Computing

    Authors: Bo Yuan, Keshab K. Parhi

    Abstract: Polar codes have emerged as the most favorable channel codes for their unique capacity-achieving property. To date, numerous works have been reported for efficient design of polar codes decoder. However, these prior efforts focused on design of polar decoders via deterministic computation, while the behavior of stochastic polar decoder, which can have potential advantages such as low complexity an… ▽ More

    Submitted 13 January, 2015; originally announced January 2015.

    Comments: accepted by International Symposium on Circuits and Systems (ISCAS) 2015

  21. arXiv:1411.7286  [pdf

    cs.IT

    Algorithm and Architecture for Hybrid Decoding of Polar Codes

    Authors: Bo Yuan, Keshab K. Parhi

    Abstract: Polar codes are the first provable capacity-achieving forward error correction (FEC) codes. In general polar codes can be decoded via either successive cancellation (SC) or belief propagation (BP) decoding algorithm. However, to date practical applications of polar codes have been hindered by the long decoding latency and limited error-correcting performance problems. In this paper, based on our r… ▽ More

    Submitted 26 November, 2014; originally announced November 2014.

    Comments: accepted by 2014 Asilomar Conference on Signals, Systems, and Computers

  22. arXiv:1411.7282  [pdf

    cs.IT

    Successive Cancellation List Polar Decoder using Log-likelihood Ratios

    Authors: Bo Yuan, Keshab K. Parhi

    Abstract: Successive cancellation list (SCL) decoding algorithm is a powerful method that can help polar codes achieve excellent error-correcting performance. However, the current SCL algorithm and decoders are based on likelihood or log-likelihood forms, which render high hardware complexity. In this paper, we propose a log-likelihood-ratio (LLR)-based SCL (LLR-SCL) decoding algorithm, which only needs hal… ▽ More

    Submitted 12 December, 2014; v1 submitted 26 November, 2014; originally announced November 2014.

    Comments: accepted by 2014 Asilomar Conference on Signals, Systems, and Computers

  23. arXiv:1406.7036  [pdf

    cs.IT

    Low-Latency Successive-Cancellation List Decoders for Polar Codes with Multi-bit Decision

    Authors: Bo Yuan, Keshab K. Parhi

    Abstract: Polar codes, as the first provable capacity-achieving error-correcting codes, have received much attention in recent years. However, the decoding performance of polar codes with traditional successive-cancellation (SC) algorithm cannot match that of the low-density parity-check (LDPC) or turbo codes. Because SC list (SCL) decoding algorithm can significantly improve the error-correcting performanc… ▽ More

    Submitted 20 September, 2014; v1 submitted 26 June, 2014; originally announced June 2014.

    Comments: submitted to IEEE TVLSI in Feb 2014, accepted in Sep. 2014

  24. arXiv:1111.0705  [pdf

    cs.AR

    Low-Latency SC Decoder Architectures for Polar Codes

    Authors: Chuan Zhang, Bo Yuan, Keshab K. Parhi

    Abstract: Nowadays polar codes are becoming one of the most favorable capacity achieving error correction codes for their low encoding and decoding complexity. However, due to the large code length required by practical applications, the few existing successive cancellation (SC) decoder implementations still suffer from not only the high hardware cost but also the long decoding latency. This paper presents… ▽ More

    Submitted 2 November, 2011; originally announced November 2011.

  25. arXiv:1111.0704  [pdf

    cs.AR

    Reduced-Latency SC Polar Decoder Architectures

    Authors: Chuan Zhang, Bo Yuan, Keshab K. Parhi

    Abstract: Polar codes have become one of the most favorable capacity achieving error correction codes (ECC) along with their simple encoding method. However, among the very few prior successive cancellation (SC) polar decoder designs, the required long code length makes the decoding latency high. In this paper, conventional decoding algorithm is transformed with look-ahead techniques. This reduces the decod… ▽ More

    Submitted 2 November, 2011; originally announced November 2011.

  26. arXiv:1111.0703  [pdf

    cs.AR

    Efficient Network for Non-Binary QC-LDPC Decoder

    Authors: Chuan Zhang, Keshab K. Parhi

    Abstract: This paper presents approaches to develop efficient network for non-binary quasi-cyclic LDPC (QC-LDPC) decoders. By exploiting the intrinsic shifting and symmetry properties of the check matrices, significant reduction of memory size and routing complexity can be achieved. Two different efficient network architectures for Class-I and Class-II non-binary QC-LDPC decoders have been proposed, respect… ▽ More

    Submitted 2 November, 2011; originally announced November 2011.