Search | arXiv e-print repository

doi 10.1109/ICASSP49357.2023.10097132

Improving performance of real-time full-band blind packet-loss concealment with predictive network

Authors: Viet-Anh Nguyen, Anh H. T. Nguyen, Andy W. H. Khong

Abstract: Packet loss concealment (PLC) is a tool for enhancing speech degradation caused by poor network conditions or underflow/overflow in audio processing pipelines. We propose a real-time recurrent method that leverages previous outputs to mitigate artefact of lost packets without the prior knowledge of loss mask. The proposed full-band recurrent network (FRN) model operates at 48 kHz, which is suitabl… ▽ More Packet loss concealment (PLC) is a tool for enhancing speech degradation caused by poor network conditions or underflow/overflow in audio processing pipelines. We propose a real-time recurrent method that leverages previous outputs to mitigate artefact of lost packets without the prior knowledge of loss mask. The proposed full-band recurrent network (FRN) model operates at 48 kHz, which is suitable for high-quality telecommunication applications. Experiment results highlight the superiority of FRN over an offline non-causal baseline and a top performer in a recent PLC challenge. △ Less

Submitted 12 May, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

Comments: In Proceedings ICASSP 2023, 5 pages, 1 figure, 4 tables

arXiv:2110.13492 [pdf, ps, other]

doi 10.1109/ICASSP43922.2022.9747699

TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining

Authors: Viet-Anh Nguyen, Anh H. T. Nguyen, Andy W. H. Khong

Abstract: We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension. The proposed architecture simplifies the UNet backbone of the TFiLM to reduce inference time and employs an efficient transformer at the bottleneck to alleviate performance degradation. We also utilize self-supervised pretraining and data augmentation to enhance the qual… ▽ More We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension. The proposed architecture simplifies the UNet backbone of the TFiLM to reduce inference time and employs an efficient transformer at the bottleneck to alleviate performance degradation. We also utilize self-supervised pretraining and data augmentation to enhance the quality of bandwidth extended signals and reduce the sensitivity with respect to downsampling methods. Experiment results on the VCTK dataset show that the proposed method outperforms several recent baselines in both intrusive and non-intrusive metrics. Pretraining and filter augmentation also help stabilize and enhance the overall performance. △ Less

Submitted 7 June, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: Published as a conference paper at ICASSP 2022, 5 pages, 4 figures, 3 tables

arXiv:2102.00196 [pdf, ps, other]

doi 10.1109/ICASSP39728.2021.9414336

Directional Sparse Filtering using Weighted Lehmer Mean for Blind Separation of Unbalanced Speech Mixtures

Authors: Karn Watcharasupat, Anh H. T. Nguyen, Ching-Hui Ooi, Andy W. H. Khong

Abstract: In blind source separation of speech signals, the inherent imbalance in the source spectrum poses a challenge for methods that rely on single-source dominance for the estimation of the mixing matrix. We propose an algorithm based on the directional sparse filtering (DSF) framework that utilizes the Lehmer mean with learnable weights to adaptively account for source imbalance. Performance evaluatio… ▽ More In blind source separation of speech signals, the inherent imbalance in the source spectrum poses a challenge for methods that rely on single-source dominance for the estimation of the mixing matrix. We propose an algorithm based on the directional sparse filtering (DSF) framework that utilizes the Lehmer mean with learnable weights to adaptively account for source imbalance. Performance evaluation in multiple real acoustic environments show improvements in source separation compared to the baseline methods. △ Less

Submitted 14 May, 2021; v1 submitted 30 January, 2021; originally announced February 2021.

Comments: (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 4485-4489

Showing 1–3 of 3 results for author: Nguyen, A H T