-
End-to-end LSTM based estimation of volcano event epicenter localization
Authors:
Nestor Becerra Yoma,
Jorge Wuth,
Andres Pinto,
Nicolas de Celis,
Jorge Celis,
Fernando Huenupan
Abstract:
In this paper, an end-to-end based LSTM scheme is proposed to address the problem of volcano event localization without any a priori model relating phase picking with localization estimation. It is worth emphasizing that automatic phase picking in volcano signals is highly inaccurate because of the short distances between the event epicenters and the seismograph stations. LSTM was chosen due to it…
▽ More
In this paper, an end-to-end based LSTM scheme is proposed to address the problem of volcano event localization without any a priori model relating phase picking with localization estimation. It is worth emphasizing that automatic phase picking in volcano signals is highly inaccurate because of the short distances between the event epicenters and the seismograph stations. LSTM was chosen due to its capability to capture the dynamics of time varying signals, and to remove or add information within the memory cell state and model long-term dependencies. A brief insight into LSTM is also discussed here. The results presented in this paper show that the LSTM based architecture provided a success rate, i.e., an error smaller than 1.0Km, equal to 48.5%, which in turn is dramatically superior to the one delivered by automatic phase picking. Moreover, the proposed end-to-end LSTM based method gave a success rate 18% higher than CNN.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Non causal deep learning based dereverberation
Authors:
Jorge Wuth,
Richard M. Stern,
Nestor Becerra Yoma
Abstract:
In this paper we demonstrate the effectiveness of non-causal context for mitigating the effects of reverberation in deep-learning-based automatic speech recognition (ASR) systems. First, the value of non-causal context using a non-causal FIR filter is shown by comparing the contributions of previous vs. future information. Second, MLP- and LSTM-based dereverberation networks were trained to confir…
▽ More
In this paper we demonstrate the effectiveness of non-causal context for mitigating the effects of reverberation in deep-learning-based automatic speech recognition (ASR) systems. First, the value of non-causal context using a non-causal FIR filter is shown by comparing the contributions of previous vs. future information. Second, MLP- and LSTM-based dereverberation networks were trained to confirm the effects of causal and non-causal context when used in ASR systems trained with clean speech. The non-causal deep-learning-based dereverberation provides a 45% relative reduction in word error rate (WER) compared to the popular weighted prediction error (WPE) method in experiments with clean training in the REVERB challenge. Finally, an expanded multicondition training procedure used in combination with a semi-enhanced test utterance generation based on combinations of reverberated and dereverberated signals is proposed to reduce any artifacts or distortion that may be introduced by the non-causal dereverberation methods. The combination of both approaches provided average relative reductions in WER equal to 10.9% and 6.0% when compared to the baseline system obtained with the most recent REVERB challenge recipe without and with WPE, respectively.
△ Less
Submitted 6 September, 2020;
originally announced September 2020.
-
On combining features for single-channel robust speech recognition in reverberant environments
Authors:
José Novoa,
Josué Fredes,
Jorge Wuth,
Fernando Huenupán,
Richard M. Stern,
Nestor Becerra Yoma
Abstract:
This paper addresses the combination of complementary parallel speech recognition systems to reduce the error rate of speech recognition systems operating in real highly-reverberant environments. First, the testing environment consists of recordings of speech in a calibrated real room with reverberation times from 0.47 to 1.77 seconds and speaker-to-microphone distances of 0.16 to 2.56 meters. We…
▽ More
This paper addresses the combination of complementary parallel speech recognition systems to reduce the error rate of speech recognition systems operating in real highly-reverberant environments. First, the testing environment consists of recordings of speech in a calibrated real room with reverberation times from 0.47 to 1.77 seconds and speaker-to-microphone distances of 0.16 to 2.56 meters. We combined systems both at the level of the DNN outputs and at the level of the final ASR outputs. Second, recognition experiments with the reverb challenge are also reported. The results presented here show that the combination of features can lead to WER improvements between 7% and 18% with speech recorded in real reverberant environments. Also, the combination at DNN-output level is much more effective than at the system-output level. However, cascading both schemes can still lead to smaller reductions in WER.
△ Less
Submitted 17 June, 2019;
originally announced June 2019.
-
Weighted delay-and-sum beamforming guided by visual tracking for human-robot interaction
Authors:
José Novoa,
Rodrigo Mahu,
Alejandro Díaz,
Jorge Wuth,
Richard Stern,
Nestor Becerra Yoma
Abstract:
This paper describes the integration of weighted delay-and-sum beamforming with speech source localization using image processing and robot head visual servoing for source tracking. We take into consideration the fact that the directivity gain provided by the beamforming depends on the angular distance between its main lobe and the main response axis of the microphone array. A visual servoing sche…
▽ More
This paper describes the integration of weighted delay-and-sum beamforming with speech source localization using image processing and robot head visual servoing for source tracking. We take into consideration the fact that the directivity gain provided by the beamforming depends on the angular distance between its main lobe and the main response axis of the microphone array. A visual servoing scheme is used to reduce the angular distance between the center of the video frame of a robot camera and a target object. Additionally, the beamforming strategy presented combines two information sources: the direction of the target object obtained with image processing and the audio signals provided by a microphone array. These sources of information were integrated by making use of a weighted delay-and-sum beamforming method. Experiments were carried out with a real mobile robotic testbed built with a PR2 robot. Static and dynamic robot head as well as the use of one and two external noise sources were considered. The results presented here show that the appropriate integration of visual source tracking with visual servoing and a beamforming method can lead to a reduction in WER as high as 34% compared to beamforming alone.
△ Less
Submitted 17 June, 2019;
originally announced June 2019.
-
An improved DNN-based spectral feature map** that removes noise and reverberation for robust automatic speech recognition
Authors:
Juan Pablo Escudero,
José Novoa,
Rodrigo Mahu,
Jorge Wuth,
Fernando Huenupán,
Richard Stern,
Néstor Becerra Yoma
Abstract:
Reverberation and additive noise have detrimental effects on the performance of automatic speech recognition systems. In this paper we explore the ability of a DNN-based spectral feature map** to remove the effects of reverberation and additive noise. Experiments with the CHiME-2 database show that this DNN can achieve an average reduction in WER of 4.5%, when compared to the baseline system, at…
▽ More
Reverberation and additive noise have detrimental effects on the performance of automatic speech recognition systems. In this paper we explore the ability of a DNN-based spectral feature map** to remove the effects of reverberation and additive noise. Experiments with the CHiME-2 database show that this DNN can achieve an average reduction in WER of 4.5%, when compared to the baseline system, at SNRs equal to -6 dB, -3 dB, 0 dB and 3 dB, and just 0.8% at greater SNRs of 6 dB and 9 dB. These results suggest that this DNN is more effective in removing additive noise than reverberation. To improve the DNN performance, we combine it with the weighted prediction error (WPE) method that shows a complementary behavior. While this combination provided a reduction in WER of approximately 11% when compared with the baseline, the observed improvement is not as great as that obtained using WPE alone. However, modifications to the DNN training process were applied and an average reduction in WER equal to 18.3% was achieved when compared with the baseline system. Furthermore, the improved DNN combined with WPE achieves a reduction in WER of 7.9% when compared with WPE alone.
△ Less
Submitted 3 April, 2018; v1 submitted 23 March, 2018;
originally announced March 2018.
-
Exploring the robustness of features and enhancement on speech recognition systems in highly-reverberant real environments
Authors:
José Novoa,
Juan Pablo Escudero,
Jorge Wuth,
Victor Poblete,
Simon King,
Richard Stern,
Néstor Becerra Yoma
Abstract:
This paper evaluates the robustness of a DNN-HMM-based speech recognition system in highly-reverberant real environments using the HRRE database. The performance of locally-normalized filter bank (LNFB) and Mel filter bank (MelFB) features in combination with Non-negative Matrix Factorization (NMF), Suppression of Slowly-varying components and the Falling edge (SSF) and Weighted Prediction Error (…
▽ More
This paper evaluates the robustness of a DNN-HMM-based speech recognition system in highly-reverberant real environments using the HRRE database. The performance of locally-normalized filter bank (LNFB) and Mel filter bank (MelFB) features in combination with Non-negative Matrix Factorization (NMF), Suppression of Slowly-varying components and the Falling edge (SSF) and Weighted Prediction Error (WPE) enhancement methods are discussed and evaluated. Two training conditions were considered: clean and reverberated (Reverb). With Reverb training the use of WPE and LNFB provides WERs that are 3% and 20% lower in average than SSF and NMF, respectively. WPE and MelFB provides WERs that are 11% and 24% lower in average than SSF and NMF, respectively. With clean training, which represents a significant mismatch between testing and training conditions, LNFB features clearly outperform MelFB features. The results show that different types of training, parametrization, and enhancement techniques may work better for a specific combination of speaker-microphone distance and reverberation time. This suggests that there could be some degree of complementarity between systems trained with different enhancement and parametrization methods.
△ Less
Submitted 23 March, 2018;
originally announced March 2018.
-
Highly-Reverberant Real Environment database: HRRE
Authors:
Juan Pablo Escudero,
Victor Poblete,
José Novoa,
Jorge Wuth,
Josué Fredes,
Rodrigo Mahu,
Richard Stern,
Néstor Becerra Yoma
Abstract:
Speech recognition in highly-reverberant real environments remains a major challenge. An evaluation dataset for this task is needed. This report describes the generation of the Highly-Reverberant Real Environment database (HRRE). This database contains 13.4 hours of data recorded in real reverberant environments and consists of 20 different testing conditions which consider a wide range of reverbe…
▽ More
Speech recognition in highly-reverberant real environments remains a major challenge. An evaluation dataset for this task is needed. This report describes the generation of the Highly-Reverberant Real Environment database (HRRE). This database contains 13.4 hours of data recorded in real reverberant environments and consists of 20 different testing conditions which consider a wide range of reverberation times and speaker-to-microphone distances. These evaluation sets were generated by re-recording the clean test set of the Aurora-4 database which corresponds to five loudspeaker-microphone distances in four reverberant conditions.
△ Less
Submitted 23 March, 2018; v1 submitted 29 January, 2018;
originally announced January 2018.
-
Multichannel Robot Speech Recognition Database: MChRSR
Authors:
José Novoa,
Juan Pablo Escudero,
Josué Fredes,
Jorge Wuth,
Rodrigo Mahu,
Néstor Becerra Yoma
Abstract:
In real human robot interaction (HRI) scenarios, speech recognition represents a major challenge due to robot noise, background noise and time-varying acoustic channel. This document describes the procedure used to obtain the Multichannel Robot Speech Recognition Database (MChRSR). It is composed of 12 hours of multichannel evaluation data recorded in a real mobile HRI scenario. This database was…
▽ More
In real human robot interaction (HRI) scenarios, speech recognition represents a major challenge due to robot noise, background noise and time-varying acoustic channel. This document describes the procedure used to obtain the Multichannel Robot Speech Recognition Database (MChRSR). It is composed of 12 hours of multichannel evaluation data recorded in a real mobile HRI scenario. This database was recorded with a PR2 robot performing different translational and azimuthal movements. Accordingly, 16 evaluation sets were obtained re-recording the clean set of the Aurora 4 database in different movement conditions.
△ Less
Submitted 29 December, 2017;
originally announced January 2018.
-
Improved signal detection algorithms for unevenly sampled data. Six signals in the radial velocity data for GJ876
Authors:
James S. Jenkins,
Nestor Becerra Yoma,
Patricio Rojo,
Rodrigo Mahu,
Jorge Wuth
Abstract:
The hunt for Earth analogue planets orbiting Sun-like stars has forced the introduction of novel methods to detect signals at, or below, the level of the intrinsic noise of the observations. We present a new global periodogram method that returns more information than the classic Lomb-Scargle periodogram method for radial velocity signal detection. Our method uses the Minimum Mean Squared Error as…
▽ More
The hunt for Earth analogue planets orbiting Sun-like stars has forced the introduction of novel methods to detect signals at, or below, the level of the intrinsic noise of the observations. We present a new global periodogram method that returns more information than the classic Lomb-Scargle periodogram method for radial velocity signal detection. Our method uses the Minimum Mean Squared Error as a framework to determine the optimal number of genuine signals present in a radial velocity timeseries using a global search algorithm, meaning we can discard noise spikes from the data before follow-up analysis. This method also allows us to determine the phase and amplitude of the signals we detect, meaning we can track these quantities as a function of time to test if the signals are stationary or non-stationary. We apply our method to the radial velocity data for GJ876 as a test system to highlight how the phase information can be used to select against non-stationary sources of detected signals in radial velocity data, such as rotational modulation of star spots. Analysis of this system yields two new statistically significant signals in the combined Keck and HARPS velocities with periods of 10 and 15 days. Although a planet with a period of 15 days would relate to a Laplace resonant chain configuration with three of the other planets (8:4:2:1), we stress that follow-up dynamical analyses are needed to test the reliability of such a six planet system.
△ Less
Submitted 3 April, 2014; v1 submitted 29 March, 2014;
originally announced March 2014.