Skip to main content

Showing 1–9 of 9 results for author: Beack, S

Searching in archive eess. Search in all archives.
.
  1. Personalized Neural Speech Codec

    Authors: Inseon Jang, Haici Yang, Wootaek Lim, Seungkwon Beack, Minje Kim

    Abstract: In this paper, we propose a personalized neural speech codec, envisioning that personalization can reduce the model complexity or improve perceptual speech quality. Despite the common usage of speech codecs where only a single talker is involved on each side of the communication, personalizing a codec for the specific user has rarely been explored in the literature. First, we assume speakers can b… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 991-995

  2. arXiv:2308.12566  [pdf, other

    eess.AS cs.SD eess.SP

    Hybrid noise sha** for audio coding using perfectly overlapped window

    Authors: Byeongho Jo, Seungkwon Beack

    Abstract: In recent years, audio coding technology has been standardized based on several frameworks that incorporate linear predictive coding (LPC). However, coding the transient signal using frequency-domain LP residual signals remains a challenge. To address this, temporal noise sha** (TNS) can be adapted, although it cannot be effectively operated since the estimated temporal envelope in the modified… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: accepted to WASPAA (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics) 2023

  3. arXiv:2304.08076  [pdf, ps, other

    eess.AS

    Audio coding with unified noise sha** and phase contrast control

    Authors: Byeongho Jo, Seungkwon Beack, Tae** Lee

    Abstract: Over the past decade, audio coding technology has seen standardization and the development of many frameworks incorporated with linear predictive coding (LPC). As LPC reduces information in the frequency domain, LP-based frequency-domain noise-sha** (FDNS) was previously proposed. To code transient signals effectively, FDNS with temporal noise sha** (TNS) has emerged. However, these mainly ope… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: Submitted and accepted in ICASSP (International Conference on Acoustics, Speech, and Signal Processing) 2023

  4. arXiv:2107.10843  [pdf, other

    eess.AS cs.AI cs.SD

    HARP-Net: Hyper-Autoencoded Reconstruction Propagation for Scalable Neural Audio Coding

    Authors: Darius Petermann, Seungkwon Beack, Minje Kim

    Abstract: An autoencoder-based codec employs quantization to turn its bottleneck layer activation into bitstrings, a process that hinders information flow between the encoder and decoder parts. To circumvent this issue, we employ additional skip connections between the corresponding pair of encoder-decoder layers. The assumption is that, in a mirrored autoencoder topology, a decoder layer reconstructs the i… ▽ More

    Submitted 23 July, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: Accepted to the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2021, Mohonk Mountain House, New Paltz, NY

  5. arXiv:2101.00054  [pdf, other

    cs.SD cs.LG eess.AS

    Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding

    Authors: Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, Minje Kim

    Abstract: Conventional audio coding technologies commonly leverage human perception of sound, or psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the decoded audio signals. For neural audio codecs, however, the objective nature of the loss function usually leads to suboptimal sound quality as well as high run-time complexity due to the large model size. In this work, we pres… ▽ More

    Submitted 31 December, 2020; originally announced January 2021.

    Journal ref: IEEE Signal Processing Letters, vol. 27, pp. 2159-2163, 2020

  6. arXiv:2008.12889  [pdf, other

    eess.AS

    Source-Aware Neural Speech Coding for Noisy Speech Compression

    Authors: Haici Yang, Kai Zhen, Seungkwon Beack, Minje Kim

    Abstract: This paper introduces a novel neural network-based speech coding system that can process noisy speech effectively. The proposed source-aware neural audio coding (SANAC) system harmonizes a deep autoencoder-based source separation model and a neural coding system so that it can explicitly perform source separation and coding in the latent space. An added benefit of this system is that the codec can… ▽ More

    Submitted 10 November, 2020; v1 submitted 28 August, 2020; originally announced August 2020.

  7. arXiv:2002.05604  [pdf, other

    eess.AS cs.MM cs.SD eess.SP

    Efficient And Scalable Neural Residual Waveform Coding With Collaborative Quantization

    Authors: Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, Minje Kim

    Abstract: Scalability and efficiency are desired in neural speech codecs, which supports a wide range of bitrates for applications on various devices. We propose a collaborative quantization (CQ) scheme to jointly learn the codebook of LPC coefficients and the corresponding residuals. CQ does not simply shoehorn LPC to a neural network, but bridges the computational capacity of advanced neural network model… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: Accepted in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , Barcelona, Spain, May 4-8, 2020

  8. arXiv:1906.07769  [pdf, other

    eess.AS cs.LG cs.SD

    Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding

    Authors: Kai Zhen, Jongmo Sung, Mi Suk Lee, Seungkwon Beack, Minje Kim

    Abstract: Speech codecs learn compact representations of speech signals to facilitate data transmission. Many recent deep neural network (DNN) based end-to-end speech codecs achieve low bitrates and high perceptual quality at the cost of model complexity. We propose a cross-module residual learning (CMRL) pipeline as a module carrier with each module reconstructing the residual from its preceding modules. C… ▽ More

    Submitted 13 September, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: Accepted for publication in INTERSPEECH 2019

    Journal ref: Published in Interspeech 2019

  9. arXiv:1809.10452  [pdf, other

    eess.IV

    Context-adaptive Entropy Model for End-to-end Optimized Image Compression

    Authors: Jooyoung Lee, Seunghyun Cho, Seung-Kwon Beack

    Abstract: We propose a context-adaptive entropy model for use in end-to-end optimized image compression. Our model exploits two types of contexts, bit-consuming contexts and bit-free contexts, distinguished based upon whether additional bit allocation is required. Based on these contexts, we allow the model to more accurately estimate the distribution of each latent representation with a more generalized fo… ▽ More

    Submitted 6 May, 2019; v1 submitted 27 September, 2018; originally announced September 2018.

    Comments: Published as a conference paper at ICLR 2019. The test code, evaluation results and reconstructed images are publicly available at https://github.com/JooyoungLeeETRI/CA_Entropy_Model