Search | arXiv e-print repository

Representation Learning Strategies to Model Pathological Speech: Effect of Multiple Spectral Resolutions

Authors: Gabriel Figueiredo Miller, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth

Abstract: This paper considers a representation learning strategy to model speech signals from patients with Parkinson's disease and cleft lip and palate. In particular, it compares different parametrized representation types such as wideband and narrowband spectrograms, and wavelet-based scalograms, with the goal of quantifying the representation capacity of each. Methods for quantification include the abi… ▽ More This paper considers a representation learning strategy to model speech signals from patients with Parkinson's disease and cleft lip and palate. In particular, it compares different parametrized representation types such as wideband and narrowband spectrograms, and wavelet-based scalograms, with the goal of quantifying the representation capacity of each. Methods for quantification include the ability of the proposed model to classify different pathologies and the associated disease severity. Additionally, this paper proposes a novel fusion strategy called multi-spectral fusion that combines wideband and narrowband spectral resolutions using a representation learning strategy based on autoencoders. The proposed models are able to classify the speech from Parkinson's disease patients with accuracy up to 95\%. The proposed models were also able to asses the dysarthria severity of Parkinson's disease patients with a Spearman correlation up to 0.75. These results outperform those observed in literature where the same problem was addressed with the same corpus. △ Less

Submitted 17 September, 2022; originally announced September 2022.

Comments: 7 pages, 3 figures

arXiv:2204.01677 [pdf, other]

Self-Supervised Speech Representations Preserve Speech Characteristics while Anonymizing Voices

Authors: Abner Hernandez, Paula Andrea Pérez-Toro, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang

Abstract: Collecting speech data is an important step in training speech recognition systems and other speech-based machine learning models. However, the issue of privacy protection is an increasing concern that must be addressed. The current study investigates the use of voice conversion as a method for anonymizing voices. In particular, we train several voice conversion models using self-supervised speech… ▽ More Collecting speech data is an important step in training speech recognition systems and other speech-based machine learning models. However, the issue of privacy protection is an increasing concern that must be addressed. The current study investigates the use of voice conversion as a method for anonymizing voices. In particular, we train several voice conversion models using self-supervised speech representations including Wav2Vec2.0, Hubert and UniSpeech. Converted voices retain a low word error rate within 1% of the original voice. Equal error rate increases from 1.52% to 46.24% on the LibriSpeech test set and from 3.75% to 45.84% on speakers from the VCTK corpus which signifies degraded performance on speaker verification. Lastly, we conduct experiments on dysarthric speech data to show that speech features relevant to articulation, prosody, phonation and phonology can be extracted from anonymized voices for discriminating between healthy and pathological speech. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: Submitted for review at Interspeech 2022

arXiv:2112.11514 [pdf, ps, other]

doi 10.1016/j.csl.2021.101321

The Phonetic Footprint of Parkinson's Disease

Authors: Philipp Klumpp, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Juan Rafael Orozco-Arroyave, Anton Batliner, Elmar Nöth

Abstract: As one of the most prevalent neurodegenerative disorders, Parkinson's disease (PD) has a significant impact on the fine motor skills of patients. The complex interplay of different articulators during speech production and realization of required muscle tension become increasingly difficult, thus leading to a dysarthric speech. Characteristic patterns such as vowel instability, slurred pronunciati… ▽ More As one of the most prevalent neurodegenerative disorders, Parkinson's disease (PD) has a significant impact on the fine motor skills of patients. The complex interplay of different articulators during speech production and realization of required muscle tension become increasingly difficult, thus leading to a dysarthric speech. Characteristic patterns such as vowel instability, slurred pronunciation and slow speech can often be observed in the affected individuals and were analyzed in previous studies to determine the presence and progression of PD. In this work, we used a phonetic recognizer trained exclusively on healthy speech data to investigate how PD affected the phonetic footprint of patients. We rediscovered numerous patterns that had been described in previous contributions although our system had never seen any pathological speech previously. Furthermore, we could show that intermediate activations from the neural network could serve as feature vectors encoding information related to the disease state of individuals. We were also able to directly correlate the expert-rated intelligibility of a speaker with the mean confidence of phonetic predictions. Our results support the assumption that pathological data is not necessarily required to train systems that are capable of analyzing PD speech. △ Less

Submitted 21 December, 2021; originally announced December 2021.

Comments: https://www.sciencedirect.com/science/article/abs/pii/S0885230821001169

Journal ref: Elsevier Computer Speech and Language, Volume 72, March 2022

arXiv:2112.08261 [pdf, other]

One System to Rule them All: a Universal Intent Recognition System for Customer Service Chatbots

Authors: Juan Camilo Vasquez-Correa, Juan Carlos Guerrero-Sierra, Jose Luis Pemberty-Tamayo, Juan Esteban Jaramillo, Andres Felipe Tejada-Castro

Abstract: Customer service chatbots are conversational systems designed to provide information to customers about products/services offered by different companies. Particularly, intent recognition is one of the core components in the natural language understating capabilities of a chatbot system. Among the different intents that a chatbot is trained to recognize, there is a set of them that is universal to… ▽ More Customer service chatbots are conversational systems designed to provide information to customers about products/services offered by different companies. Particularly, intent recognition is one of the core components in the natural language understating capabilities of a chatbot system. Among the different intents that a chatbot is trained to recognize, there is a set of them that is universal to any customer service chatbot. Universal intents may include salutation, switch the conversation to a human agent, farewells, among others. A system to recognize those universal intents will be very helpful to optimize the training process of specific customer service chatbots. We propose the development of a universal intent recognition system, which is trained to recognize a selected group of 11 intents that are common in 28 different chatbots. The proposed system is trained considering state-of-the-art word-embedding models such as word2vec and BERT, and deep classifiers based on convolutional and recurrent neural networks. The proposed model is able to discriminate between those universal intents with a balanced accuracy up to 80.4\%. In addition, the proposed system is equally accurate to recognize intents expressed both in short and long text requests. At the same time, misclassification errors often occurs between intents with very similar semantic fields such as farewells and positive comments. The proposed system will be very helpful to optimize the training process of a customer service chatbot because some of the intents will be already available and detected by our system. At the same time, the proposed approach will be a suitable base model to train more specific chatbots by applying transfer learning strategies. △ Less

Submitted 15 December, 2021; originally announced December 2021.

arXiv:2107.02759 [pdf, other]

Gender Recognition in Informal and Formal Language Scenarios via Transfer Learning

Authors: Daniel Escobar-Grisales, Juan Camilo Vasquez-Correa, Juan Rafael Orozco-Arroyave

Abstract: The interest in demographic information retrieval based on text data has increased in the research community because applications have shown success in different sectors such as security, marketing, heath-care, and others. Recognition and identification of demographic traits such as gender, age, location, or personality based on text data can help to improve different marketing strategies. For ins… ▽ More The interest in demographic information retrieval based on text data has increased in the research community because applications have shown success in different sectors such as security, marketing, heath-care, and others. Recognition and identification of demographic traits such as gender, age, location, or personality based on text data can help to improve different marketing strategies. For instance it makes it possible to segment and to personalize offers, thus products and services are exposed to the group of greatest interest. This type of technology has been discussed widely in documents from social media. However, the methods have been poorly studied in data with a more formal structure, where there is no access to emoticons, mentions, and other linguistic phenomena that are only present in social media. This paper proposes the use of recurrent and convolutional neural networks, and a transfer learning strategy for gender recognition in documents that are written in informal and formal languages. Models are tested in two different databases consisting of Tweets and call-center conversations. Accuracies of up to 75\% are achieved for both databases. The results also indicate that it is possible to transfer the knowledge from a system trained on a specific type of expressions or idioms such as those typically used in social media into a more formal type of text data, where the amount of data is more scarce and its structure is completely different. △ Less

Submitted 23 June, 2021; originally announced July 2021.

arXiv:2002.05412 [pdf, other]

Comparison of user models based on GMM-UBM and i-vectors for speech, handwriting, and gait assessment of Parkinson's disease patients

Authors: J. C. Vasquez-Correa, T. Bocklet, J. R. Orozco-Arroyave, E. Nöth

Abstract: Parkinson's disease is a neurodegenerative disorder characterized by the presence of different motor impairments. Information from speech, handwriting, and gait signals have been considered to evaluate the neurological state of the patients. On the other hand, user models based on Gaussian mixture models - universal background models (GMM-UBM) and i-vectors are considered the state-of-the-art in b… ▽ More Parkinson's disease is a neurodegenerative disorder characterized by the presence of different motor impairments. Information from speech, handwriting, and gait signals have been considered to evaluate the neurological state of the patients. On the other hand, user models based on Gaussian mixture models - universal background models (GMM-UBM) and i-vectors are considered the state-of-the-art in biometric applications like speaker verification because they are able to model specific speaker traits. This study introduces the use of GMM-UBM and i-vectors to evaluate the neurological state of Parkinson's patients using information from speech, handwriting, and gait. The results show the importance of different feature sets from each type of signal in the assessment of the neurological state of the patients. △ Less

Submitted 13 February, 2020; originally announced February 2020.

Comments: Proceedings of ICASSP (2019)

arXiv:2002.05411 [pdf, other]

doi 10.1016/j.cmpb.2019.03.005

Analysis and Evaluation of Handwriting in Patients with Parkinson's Disease Using kinematic, Geometrical, and Non-linear Features

Authors: C. D. Rios-Urrego, J. C. Vásquez-Correa, J. F. Vargas-Bonilla, E. Nöth, F. Lopera, J. R. Orozco-Arroyave

Abstract: Background and objectives: Parkinson's disease is a neurological disorder that affects the motor system producing lack of coordination, resting tremor, and rigidity. Impairments in handwriting are among the main symptoms of the disease. Handwriting analysis can help in supporting the diagnosis and in monitoring the progress of the disease. This paper aims to evaluate the importance of different gr… ▽ More Background and objectives: Parkinson's disease is a neurological disorder that affects the motor system producing lack of coordination, resting tremor, and rigidity. Impairments in handwriting are among the main symptoms of the disease. Handwriting analysis can help in supporting the diagnosis and in monitoring the progress of the disease. This paper aims to evaluate the importance of different groups of features to model handwriting deficits that appear due to Parkinson's disease; and how those features are able to discriminate between Parkinson's disease patients and healthy subjects. Methods: Features based on kinematic, geometrical and non-linear dynamics analyses were evaluated to classify Parkinson's disease and healthy subjects. Classifiers based on K-nearest neighbors, support vector machines, and random forest were considered. Results: Accuracies of up to $93.1\%$ were obtained in the classification of patients and healthy control subjects. A relevance analysis of the features indicated that those related to speed, acceleration, and pressure are the most discriminant. The automatic classification of patients in different stages of the disease shows $κ$ indexes between $0.36$ and $0.44$. Accuracies of up to $83.3\%$ were obtained in a different dataset used only for validation purposes. Conclusions: The results confirmed the negative impact of aging in the classification process when we considered different groups of healthy subjects. In addition, the results reported with the separate validation set comprise a step towards the development of automated tools to support the diagnosis process in clinical practice. △ Less

Submitted 13 February, 2020; originally announced February 2020.

Journal ref: Computer methods and programs in biomedicine, 173, 43-52 (2019)

arXiv:2002.04374 [pdf, other]

doi 10.1007/978-3-030-33904-3_66

Convolutional Neural Networks and a Transfer Learning Strategy to Classify Parkinson's Disease from Speech in Three Different Languages

Authors: J. C. Vásquez-Correa, T. Arias-Vergara, C. D. Rios-Urrego, M. Schuster, J. Rusz, J. R. Orozco-Arroyave, E. Nöth

Abstract: Parkinson's disease patients develop different speech impairments that affect their communication capabilities. The automatic assessment of the speech of the patients allows the development of computer aided tools to support the diagnosis and the evaluation of the disease severity. This paper introduces a methodology to classify Parkinson's disease from speech in three different languages: Spanish… ▽ More Parkinson's disease patients develop different speech impairments that affect their communication capabilities. The automatic assessment of the speech of the patients allows the development of computer aided tools to support the diagnosis and the evaluation of the disease severity. This paper introduces a methodology to classify Parkinson's disease from speech in three different languages: Spanish, German, and Czech. The proposed approach considers convolutional neural networks trained with time frequency representations and a transfer learning strategy among the three languages. The transfer learning scheme aims to improve the accuracy of the models when the weights of the neural network are initialized with utterances from a different language than the used for the test set. The results suggest that the proposed strategy improves the accuracy of the models in up to 8\% when the base model used to initialize the weights of the classifier is robust enough. In addition, the results obtained after the transfer learning are in most cases more balanced in terms of specificity-sensitivity than those trained without the transfer learning strategy. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Journal ref: In Iberoamerican Congress on Pattern Recognition (pp. 697-706) 2019

Showing 1–8 of 8 results for author: Vasquez-Correa, J C