Skip to main content

Showing 1–11 of 11 results for author: Zhen, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2210.09188  [pdf, other

    cs.SD cs.LG eess.AS

    Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

    Authors: Kai Zhen, Martin Radfar, Hieu Duy Nguyen, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris

    Abstract: For on-device automatic speech recognition (ASR), quantization aware training (QAT) is ubiquitous to achieve the trade-off between model predictive performance and efficiency. Among existing QAT methods, one major drawback is that the quantization centroids have to be predetermined and fixed. To overcome this limitation, we introduce a regularization-free, "soft-to-hard" compression mechanism with… ▽ More

    Submitted 1 November, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: Accepted for publication at IEEE SLT'22

  2. arXiv:2206.15408  [pdf, other

    eess.AS cs.AI eess.SP

    Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition

    Authors: Kai Zhen, Hieu Duy Nguyen, Raviteja Chinta, Nathan Susanj, Athanasios Mouchtaris, Tariq Afzal, Ariya Rastrow

    Abstract: We present a novel sub-8-bit quantization-aware training (S8BQAT) scheme for 8-bit neural network accelerators. Our method is inspired from Lloyd-Max compression theory with practical adaptations for a feasible computational overhead during training. With the quantization centroids derived from a 32-bit baseline, we augment training loss with a Multi-Regional Absolute Cosine (MRACos) regularizer t… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

    Comments: Accepted for publication in INTERSPEECH 2022

  3. arXiv:2202.13588   

    eess.IV cs.CV

    Using Multi-scale SwinTransformer-HTC with Data augmentation in CoNIC Challenge

    Authors: Chia-Yen Lee, Hsiang-Chin Chien, Ching-** Wang, Hong Yen, Kai-Wen Zhen, Hong-Kun Lin

    Abstract: Colorectal cancer is one of the most common cancers worldwide, so early pathological examination is very important. However, it is time-consuming and labor-intensive to identify the number and type of cells on H&E images in clinical. Therefore, automatic segmentation and classification task and counting the cellular composition of H&E images from pathological sections is proposed by CoNIC Challeng… ▽ More

    Submitted 16 April, 2024; v1 submitted 28 February, 2022; originally announced February 2022.

    Comments: Errors have been identified in the analysis

  4. arXiv:2103.14776  [pdf, other

    eess.AS cs.LG cs.SD

    Scalable and Efficient Neural Speech Coding: A Hybrid Design

    Authors: Kai Zhen, Jongmo Sung, Mi Suk Lee, Seungkwon Beak, Minje Kim

    Abstract: We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural waveform codec (NWC) during its feedforward routine. The proposed NWC also defines quantization and entropy coding as a trainable module, so the coding artifact… ▽ More

    Submitted 27 November, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP), 2021 (Accepted for publication)

  5. arXiv:2102.04932  [pdf, other

    cs.LG cs.AI cs.CL cs.SD eess.AS

    Sparsification via Compressed Sensing for Automatic Speech Recognition

    Authors: Kai Zhen, Hieu Duy Nguyen, Feng-Ju Chang, Athanasios Mouchtaris, Ariya Rastrow, .

    Abstract: In order to achieve high accuracy for machine learning (ML) applications, it is essential to employ models with a large number of parameters. Certain applications, such as Automatic Speech Recognition (ASR), however, require real-time interactions with users, hence compelling the model to have as low latency as possible. Deploying large scale ML applications thus necessitates model quantization an… ▽ More

    Submitted 9 February, 2021; originally announced February 2021.

    Comments: 5 pages, accepted for publication in (ICASSP 2021) 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing. June 6-12, 2021. Location: Toronto, ON, Canada

  6. arXiv:2101.00054  [pdf, other

    cs.SD cs.LG eess.AS

    Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding

    Authors: Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, Minje Kim

    Abstract: Conventional audio coding technologies commonly leverage human perception of sound, or psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the decoded audio signals. For neural audio codecs, however, the objective nature of the loss function usually leads to suboptimal sound quality as well as high run-time complexity due to the large model size. In this work, we pres… ▽ More

    Submitted 31 December, 2020; originally announced January 2021.

    Journal ref: IEEE Signal Processing Letters, vol. 27, pp. 2159-2163, 2020

  7. arXiv:2008.12889  [pdf, other

    eess.AS

    Source-Aware Neural Speech Coding for Noisy Speech Compression

    Authors: Haici Yang, Kai Zhen, Seungkwon Beack, Minje Kim

    Abstract: This paper introduces a novel neural network-based speech coding system that can process noisy speech effectively. The proposed source-aware neural audio coding (SANAC) system harmonizes a deep autoencoder-based source separation model and a neural coding system so that it can explicitly perform source separation and coding in the latent space. An added benefit of this system is that the codec can… ▽ More

    Submitted 10 November, 2020; v1 submitted 28 August, 2020; originally announced August 2020.

  8. arXiv:2002.05604  [pdf, other

    eess.AS cs.MM cs.SD eess.SP

    Efficient And Scalable Neural Residual Waveform Coding With Collaborative Quantization

    Authors: Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, Minje Kim

    Abstract: Scalability and efficiency are desired in neural speech codecs, which supports a wide range of bitrates for applications on various devices. We propose a collaborative quantization (CQ) scheme to jointly learn the codebook of LPC coefficients and the corresponding residuals. CQ does not simply shoehorn LPC to a neural network, but bridges the computational capacity of advanced neural network model… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: Accepted in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , Barcelona, Spain, May 4-8, 2020

  9. arXiv:1908.06468  [pdf, other

    cs.SD cs.LG eess.AS

    A Dual-Staged Context Aggregation Method Towards Efficient End-To-End Speech Enhancement

    Authors: Kai Zhen, Mi Suk Lee, Minje Kim

    Abstract: In speech enhancement, an end-to-end deep neural network converts a noisy speech signal to a clean speech directly in time domain without time-frequency transformation or mask estimation. However, aggregating contextual information from a high-resolution time domain signal with an affordable model complexity still remains challenging. In this paper, we propose a densely connected convolutional and… ▽ More

    Submitted 6 February, 2020; v1 submitted 18 August, 2019; originally announced August 2019.

    Comments: Accepted in Proceedings of the ICASSP, Barcelona, Spain, May 4-8, 2020

  10. arXiv:1906.07769  [pdf, other

    eess.AS cs.LG cs.SD

    Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding

    Authors: Kai Zhen, Jongmo Sung, Mi Suk Lee, Seungkwon Beack, Minje Kim

    Abstract: Speech codecs learn compact representations of speech signals to facilitate data transmission. Many recent deep neural network (DNN) based end-to-end speech codecs achieve low bitrates and high perceptual quality at the cost of model complexity. We propose a cross-module residual learning (CMRL) pipeline as a module carrier with each module reconstructing the residual from its preceding modules. C… ▽ More

    Submitted 13 September, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: Accepted for publication in INTERSPEECH 2019

    Journal ref: Published in Interspeech 2019

  11. arXiv:1801.09774  [pdf, other

    cs.SD eess.AS

    On Psychoacoustically Weighted Cost Functions Towards Resource-Efficient Deep Neural Networks for Speech Denoising

    Authors: Kai Zhen, Aswin Sivaraman, Jongmo Sung, Minje Kim

    Abstract: We present a psychoacoustically enhanced cost function to balance network complexity and perceptual performance of deep neural networks for speech denoising. While training the network, we utilize perceptual weights added to the ordinary mean-squared error to emphasize contribution from frequency bins which are most audible while ignoring error from inaudible bins. To generate the weights, we empl… ▽ More

    Submitted 29 January, 2018; originally announced January 2018.

    Comments: 5 pages, 4 figures