Exploring Spoken Named Entity Recognition: A Cross-Lingual Perspective
Authors:
Moncef Benaicha,
David Thulke,
M. A. Tuğtekin Turan
Abstract:
Recent advancements in Named Entity Recognition (NER) have significantly improved the identification of entities in textual data. However, spoken NER, a specialized field of spoken document retrieval, lags behind due to its limited research and scarce datasets. Moreover, cross-lingual transfer learning in spoken NER has remained unexplored. This paper utilizes transfer learning across Dutch, Engli…
▽ More
Recent advancements in Named Entity Recognition (NER) have significantly improved the identification of entities in textual data. However, spoken NER, a specialized field of spoken document retrieval, lags behind due to its limited research and scarce datasets. Moreover, cross-lingual transfer learning in spoken NER has remained unexplored. This paper utilizes transfer learning across Dutch, English, and German using pipeline and End-to-End (E2E) schemes. We employ Wav2Vec2-XLS-R models on custom pseudo-annotated datasets and investigate several architectures for the adaptability of cross-lingual systems. Our results demonstrate that End-to-End spoken NER outperforms pipeline-based alternatives over our limited annotations. Notably, transfer learning from German to Dutch surpasses the Dutch E2E system by 7% and the Dutch pipeline system by 4%. This study not only underscores the feasibility of transfer learning in spoken NER but also sets promising outcomes for future evaluations, hinting at the need for comprehensive data collection to augment the results.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
Enhancement of Throat Microphone Recordings Using Gaussian Mixture Model Probabilistic Estimator
Authors:
Mehmet Ali Tugtekin Turan
Abstract:
The throat microphone is a body-attached transducer that is worn against the neck. It captures the signals that are transmitted through the vocal folds, along with the buzz tone of the larynx. Due to its skin contact, it is more robust to the environmental noise compared to the acoustic microphone that picks up the vibrations through air pressure, and hence the all interventions. The throat speech…
▽ More
The throat microphone is a body-attached transducer that is worn against the neck. It captures the signals that are transmitted through the vocal folds, along with the buzz tone of the larynx. Due to its skin contact, it is more robust to the environmental noise compared to the acoustic microphone that picks up the vibrations through air pressure, and hence the all interventions. The throat speech is partly intelligible, but gives unnatural and croaky sound. This thesis tries to recover missing frequency bands of the throat speech and investigates envelope and excitation map** problem with joint analysis of throat- and acoustic-microphone recordings. A new phone-dependent GMM-based spectral envelope map** scheme, which performs the minimum mean square error (MMSE) estimation of the acoustic-microphone spectral envelope, has been proposed. In the source-filter decomposition framework, we observed that the spectral envelope difference of the excitation signals of throat- and acoustic-microphone recordings is an important source of the degradation in the throat-microphone voice quality. Thus, we also model spectral envelope difference of the excitation signals as a spectral tilt vector, and propose a new phone-dependent GMM-based spectral tilt map** scheme to enhance throat excitation signal. Experimental evaluations are performed to compare the proposed map** scheme using both objective and subjective evaluations. Objective evaluations are performed with the log-spectral distortion (LSD) and the wide-band perceptual evaluation of speech quality (PESQ) metrics. Subjective evaluations are performed with A/B pair comparison listening test. Both objective and subjective evaluations yield that the proposed phone-dependent map** consistently improves performances over the state-of-the-art GMM estimators.
△ Less
Submitted 13 April, 2018;
originally announced April 2018.