Search | arXiv e-print repository

Speech Enhancement Based on Non-stationary Noise-driven Geometric Spectral Subtraction and Phase Spectrum Compensation

Authors: Md Tauhidul Islam, Udoy Saha, K. T. Shahid, Ahmed Bin Hussain, Celia Shahnaz

Abstract: In this paper, a speech enhancement method based on noise compensation performed on short time magnitude as well phase spectra is presented. Unlike the conventional geometric approach (GA) to spectral subtraction (SS), here the noise estimate to be subtracted from the noisy speech spectrum is proposed to be determined by exploiting the low frequency regions of current frame of noisy speech rather… ▽ More In this paper, a speech enhancement method based on noise compensation performed on short time magnitude as well phase spectra is presented. Unlike the conventional geometric approach (GA) to spectral subtraction (SS), here the noise estimate to be subtracted from the noisy speech spectrum is proposed to be determined by exploiting the low frequency regions of current frame of noisy speech rather than depending only on the initial silence frames. This approach gives the capability of tracking non-stationary noise thus resulting in a non-stationary noise-driven geometric approach of spectral subtraction for speech enhancement. The noise compensated magnitude spectrum from the GA step is then recombined with unchanged phase of noisy speech spectrum and used in phase compensation to obtain an enhanced complex spectrum, which is used to produce an enhanced speech frame. Extensive simulations are carried out using speech files available in the NOIZEUS database shows that the proposed method consistently outperforms some of the recent methods of speech enhancement when employed on the noisy speeches corrupted by street or babble noise at different levels of SNR in terms of objective measures, spectrogram analysis and formal subjective listening tests. △ Less

Submitted 3 March, 2018; originally announced March 2018.

Comments: 13 pages, 10 figures, 8 tables. arXiv admin note: substantial text overlap with arXiv:1803.00396; text overlap with arXiv:1802.02665, arXiv:1802.05125, arXiv:1803.01841

arXiv:1803.01841 [pdf]

Enhancement of Noisy Speech exploiting a Gaussian Modeling based Threshold and a PDF Dependent Thresholding Function

Authors: Md Tauhidul Islam, Celia Shahnaz

Abstract: This paper presents a speech enhancement method, where an adaptive threshold is statistically determined based on Gaussian modeling of Teager energy (TE) operated perceptual wavelet packet (PWP) coefficients of noisy speech. In order to obtain an enhanced speech, the threshold thus derived is applied upon the PWP coefficients by employing a Gaussian pdf dependent custom thresholding function, whic… ▽ More This paper presents a speech enhancement method, where an adaptive threshold is statistically determined based on Gaussian modeling of Teager energy (TE) operated perceptual wavelet packet (PWP) coefficients of noisy speech. In order to obtain an enhanced speech, the threshold thus derived is applied upon the PWP coefficients by employing a Gaussian pdf dependent custom thresholding function, which is designed based on a combination of modified hard and semisoft thresholding functions. The effectiveness of the proposed method is evaluated for car and multi-talker babble noise corrupted speech signals through performing extensive simulations using the NOIZEUS database. The proposed method is found to outperform some of the state-of-the-art speech enhancement methods not only at at high but also at low levels of SNRs in the sense of standard objective measures and subjective evaluations including formal listening tests. △ Less

Submitted 3 March, 2018; originally announced March 2018.

Comments: 22 pages, 18 figures, 8 tables; submitted to EURASIP Journal on Audio, Speech, and Music Processing. arXiv admin note: substantial text overlap with arXiv:1802.05962; text overlap with arXiv:1802.03472

arXiv:1803.00396 [pdf]

Speech Enhancement in Adverse Environments Based on Non-stationary Noise-driven Spectral Subtraction and SNR-dependent Phase Compensation

Authors: Md Tauhidul Islam, Asaduzzaman, Celia Shahnaz, Wei-** Zhu, M. Omair Ahmad

Abstract: A two-step enhancement method based on spectral subtraction and phase spectrum compensation is presented in this paper for noisy speeches in adverse environments involving non-stationary noise and medium to low levels of SNR. The magnitude of the noisy speech spectrum is modified in the first step of the proposed method by a spectral subtraction approach, where a new noise estimation method based… ▽ More A two-step enhancement method based on spectral subtraction and phase spectrum compensation is presented in this paper for noisy speeches in adverse environments involving non-stationary noise and medium to low levels of SNR. The magnitude of the noisy speech spectrum is modified in the first step of the proposed method by a spectral subtraction approach, where a new noise estimation method based on the low frequency information of the noisy speech is introduced. We argue that this method of noise estimation is capable of estimating the non-stationary noise accurately. The phase spectrum of the noisy speech is modified in the second step consisting of phase spectrum compensation, where an SNR-dependent approach is incorporated to determine the amount of compensation to be imposed on the phase spectrum. A modified complex spectrum is obtained by aggregating the magnitude from the spectral subtraction step and modified phase spectrum from the phase compensation step, which is found to be a better representation of enhanced speech spectrum. Speech files available in the NOIZEUS database are used to carry extensive simulations for evaluation of the proposed method. △ Less

Submitted 18 February, 2018; originally announced March 2018.

Comments: 15 pages, 10 figures, 8 tables. arXiv admin note: substantial text overlap with arXiv:1802.02665; text overlap with arXiv:1802.05125

arXiv:1802.05125 [pdf]

Enhancement of Noisy Speech with Low Speech Distortion Based on Probabilistic Geometric Spectral Subtraction

Authors: Md Tauhidul Islam, Celia Shahnaz, Wei-** Zhu, M. Omair Ahmad

Abstract: A speech enhancement method based on probabilistic geometric approach to spectral subtraction (PGA) performed on short time magnitude spectrum is presented in this paper. A confidence parameter of noise estimation is introduced in the gain function of the proposed method to prevent subtraction of the overestimated and underestimated noise, which not only removes the noise efficiently but also prev… ▽ More A speech enhancement method based on probabilistic geometric approach to spectral subtraction (PGA) performed on short time magnitude spectrum is presented in this paper. A confidence parameter of noise estimation is introduced in the gain function of the proposed method to prevent subtraction of the overestimated and underestimated noise, which not only removes the noise efficiently but also prevents the speech distortion. The noise compensated magnitude spectrum is then recombined with the unchanged phase spectrum to produce a modified complex spectrum prior to synthesize an enhanced frame. Extensive simulations are carried out using the speech files available in the NOIZEUS database in order to evaluate the performance of the proposed method. △ Less

Submitted 12 February, 2018; originally announced February 2018.

Comments: To appear in Computer Speech and Language, 16 pages, 13 figures, 8 tables. arXiv admin note: text overlap with arXiv:1802.02665

arXiv:1802.03472 [pdf]

Modeling of Teager Energy Operated Perceptual Wavelet Packet Coefficients with an Erlang-2 PDF for Real Time Enhancement of Noisy Speech

Authors: Md Tauhidul Islam, Celia Shahnaz, Wei-** Zhu, M. Omair Ahmad

Abstract: In this paper, for real time enhancement of noisy speech, a method of threshold determination based on modeling of Teager energy (TE) operated perceptual wavelet packet (PWP) coefficients of the noisy speech and noise by an Erlang-2 PDF is presented. The proposed method is computationally much faster than the existing wavelet packet based thresholding methods. A custom thresholding function based… ▽ More In this paper, for real time enhancement of noisy speech, a method of threshold determination based on modeling of Teager energy (TE) operated perceptual wavelet packet (PWP) coefficients of the noisy speech and noise by an Erlang-2 PDF is presented. The proposed method is computationally much faster than the existing wavelet packet based thresholding methods. A custom thresholding function based on a combination of mu-law and semisoft thresholding functions is designed and exploited to apply the statistically derived threshold upon the PWP coefficients. The proposed custom thresholding function works as a mu-law or a semisoft thresholding function or their combination based on the probability of speech presence and absence in a subband of the PWP transformed noisy speech. By using the speech files available in NOIZEUS database, a number of simulations are performed to evaluate the performance of the proposed method for speech signals in the presence of Gaussian white and street noises. The proposed method outperforms some of the state-of-the-art speech enhancement methods both at high and low levels of SNRs in terms of standard objective measures and subjective evaluations including formal listening tests. △ Less

Submitted 9 February, 2018; originally announced February 2018.

Comments: To appear in Digital Signal Processing, 27 pages, 19 figures, 10 tables

arXiv:1802.02665 [pdf]

A Divide and Conquer Strategy for Musical Noise-free Speech Enhancement in Adverse Environments

Authors: Md Tauhidul Islam, Celia Shahnaz, Wei-** Zhu, M. Omair Ahmad

Abstract: A divide and conquer strategy for enhancement of noisy speeches in adverse environments involving lower levels of SNR is presented in this paper, where the total system of speech enhancement is divided into two separate steps. The first step is based on noise compensation on short time magnitude and the second step is based on phase compensation. The magnitude spectrum is compensated based on a mo… ▽ More A divide and conquer strategy for enhancement of noisy speeches in adverse environments involving lower levels of SNR is presented in this paper, where the total system of speech enhancement is divided into two separate steps. The first step is based on noise compensation on short time magnitude and the second step is based on phase compensation. The magnitude spectrum is compensated based on a modified spectral subtraction method where the cross-terms containing spectra of noise and clean speech are taken into consideration, which are neglected in the traditional spectral subtraction methods. By employing the modified magnitude and unchanged phase, a procedure is formulated to compensate the overestimation or underestimation of noise by phase compensation method based on the probability of speech presence. A modified complex spectrum based on these two steps are obtained to synthesize a musical noise free enhanced speech. Extensive simulations are carried out using the speech files available in the NOIZEUS database in order to evaluate the performance of the proposed method. It is shown in terms of the objective measures, spectrogram analysis and formal subjective listening tests that the proposed method consistently outperforms some of the state-of-the-art methods of speech enhancement for noisy speech corrupted by street or babble noise at very low as well as medium levels of SNR. △ Less

Submitted 7 February, 2018; originally announced February 2018.

Comments: 11 pages, 8 tables, 12 figures

arXiv:1710.01115 [pdf, other]

doi 10.1109/R10-HTC.2017.8289058

Detection of Inferior Myocardial Infarction using Shallow Convolutional Neural Networks

Authors: Tahsin Reasat, Celia Shahnaz

Abstract: Myocardial Infarction is one of the leading causes of death worldwide. This paper presents a Convolutional Neural Network (CNN) architecture which takes raw Electrocardiography (ECG) signal from lead II, III and AVF and differentiates between inferior myocardial infarction (IMI) and healthy signals. The performance of the model is evaluated on IMI and healthy signals obtained from Physikalisch-Tec… ▽ More Myocardial Infarction is one of the leading causes of death worldwide. This paper presents a Convolutional Neural Network (CNN) architecture which takes raw Electrocardiography (ECG) signal from lead II, III and AVF and differentiates between inferior myocardial infarction (IMI) and healthy signals. The performance of the model is evaluated on IMI and healthy signals obtained from Physikalisch-Technische Bundesanstalt (PTB) database. A subject-oriented approach is taken to comprehend the generalization capability of the model and compared with the current state of the art. In a subject-oriented approach, the network is tested on one patient and trained on rest of the patients. Our model achieved a superior metrics scores (accuracy= 84.54%, sensitivity= 85.33% and specificity= 84.09%) when compared to the benchmark. We also analyzed the discriminating strength of the features extracted by the convolutional layers by means of geometric separability index and euclidean distance and compared it with the benchmark model. △ Less

Submitted 18 August, 2018; v1 submitted 3 October, 2017; originally announced October 2017.

MSC Class: 68T10; 68T10 ACM Class: I.5.1; I.5.4

Showing 1–7 of 7 results for author: Shahnaz, C