Search | arXiv e-print repository

Impact of Speech Mode in Automatic Pathological Speech Detection

Abstract: Automatic pathological speech detection approaches yield promising results in identifying various pathologies. These approaches are typically designed and evaluated for phonetically-controlled speech scenarios, where speakers are prompted to articulate identical phonetic content. While gathering controlled speech recordings can be laborious, spontaneous speech can be conveniently acquired as poten… ▽ More Automatic pathological speech detection approaches yield promising results in identifying various pathologies. These approaches are typically designed and evaluated for phonetically-controlled speech scenarios, where speakers are prompted to articulate identical phonetic content. While gathering controlled speech recordings can be laborious, spontaneous speech can be conveniently acquired as potential patients navigate their daily routines. Further, spontaneous speech can be valuable in detecting subtle and abstract cues of pathological speech. Nonetheless, the efficacy of automatic pathological speech detection for spontaneous speech remains unexplored. This paper analyzes the influence of speech mode on pathological speech detection approaches, examining two distinct categories of approaches, i.e., classical machine learning and deep learning. Results indicate that classical approaches may struggle to capture pathology-discriminant cues in spontaneous speech. In contrast, deep learning approaches demonstrate superior performance, managing to extract additional cues that were previously inaccessible in non-spontaneous speech △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted in EUSIPCO 2024

arXiv:2406.02572 [pdf, other]

Selfsupervised learning for pathological speech detection

Authors: Shakeel Ahmad Sheikh

Abstract: Speech production is a complex phenomenon, wherein the brain orchestrates a sequence of processes involving thought processing, motor planning, and the execution of articulatory movements. However, this intricate execution of various processes is susceptible to influence and disruption by various neurodegenerative pathological speech disorders, such as Parkinsons' disease, resulting in dysarthria,… ▽ More Speech production is a complex phenomenon, wherein the brain orchestrates a sequence of processes involving thought processing, motor planning, and the execution of articulatory movements. However, this intricate execution of various processes is susceptible to influence and disruption by various neurodegenerative pathological speech disorders, such as Parkinsons' disease, resulting in dysarthria, apraxia, and other conditions. These disorders lead to pathological speech characterized by abnormal speech patterns and imprecise articulation. Diagnosing these speech disorders in clinical settings typically involves auditory perceptual tests, which are time-consuming, and the diagnosis can vary among clinicians based on their experiences, biases, and cognitive load during the diagnosis. Additionally, unlike neurotypical speakers, patients with speech pathologies or impairments are unable to access various virtual assistants such as Alexa, Siri, etc. To address these challenges, several automatic pathological speech detection (PSD) approaches have been proposed. These approaches aim to provide efficient and accurate detection of speech disorders, thereby facilitating timely intervention and support for individuals affected by these conditions. These approaches mainly vary in two aspects: the input representations utilized and the classifiers employed. Due to the limited availability of data, the performance of detection remains subpar. Self-supervised learning (SSL) embeddings, such as wav2vec2, and their multilingual versions, are being explored as a promising avenue to improve performance. These embeddings leverage self-supervised learning techniques to extract rich representations from audio data, thereby offering a potential solution to address the limitations posed by the scarcity of labeled data. △ Less

Submitted 16 May, 2024; originally announced June 2024.

Comments: in Intersection of Book Chapter in Machine Leanring and Computational Social Sciences CRC (in progress) 2024

arXiv:2306.00689 [pdf, other]

Stuttering Detection Using Speaker Representations and Self-supervised Contextual Embeddings

Authors: Shakeel A. Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

Abstract: The adoption of advanced deep learning architectures in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets. To this end, this work introduces the application of speech embeddings extracted from pre-trained deep learning models trained on large audio datasets for different tasks. In particular, we explore audio representations obtained using emphasized… ▽ More The adoption of advanced deep learning architectures in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets. To this end, this work introduces the application of speech embeddings extracted from pre-trained deep learning models trained on large audio datasets for different tasks. In particular, we explore audio representations obtained using emphasized channel attention, propagation, and aggregation time delay neural network (ECAPA-TDNN) and Wav2Vec2.0 models trained on VoxCeleb and LibriSpeech datasets respectively. After extracting the embeddings, we benchmark with several traditional classifiers, such as the K-nearest neighbour (KNN), Gaussian naive Bayes, and neural network, for the SD tasks. In comparison to the standard SD systems trained only on the limited SEP-28k dataset, we obtain a relative improvement of 12.08%, 28.71%, 37.9% in terms of unweighted average recall (UAR) over the baselines. Finally, we have shown that combining two embeddings and concatenating multiple layers of Wav2Vec2.0 can further improve the UAR by up to 2.60% and 6.32% respectively. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: Accepted in International Journal of Speech Technology, Springer 2023 substantial overlap with arXiv:2204.01564

arXiv:2302.11343 [pdf, other]

Advancing Stuttering Detection via Data Augmentation, Class-Balanced Loss and Multi-Contextual Deep Learning

Authors: Shakeel A. Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

Abstract: Stuttering is a neuro-developmental speech impairment characterized by uncontrolled utterances (interjections) and core behaviors (blocks, repetitions, and prolongations), and is caused by the failure of speech sensorimotors. Due to its complex nature, stuttering detection (SD) is a difficult task. If detected at an early stage, it could facilitate speech therapists to observe and rectify the spee… ▽ More Stuttering is a neuro-developmental speech impairment characterized by uncontrolled utterances (interjections) and core behaviors (blocks, repetitions, and prolongations), and is caused by the failure of speech sensorimotors. Due to its complex nature, stuttering detection (SD) is a difficult task. If detected at an early stage, it could facilitate speech therapists to observe and rectify the speech patterns of persons who stutter (PWS). The stuttered speech of PWS is usually available in limited amounts and is highly imbalanced. To this end, we address the class imbalance problem in the SD domain via a multibranching (MB) scheme and by weighting the contribution of classes in the overall loss function, resulting in a huge improvement in stuttering classes on the SEP-28k dataset over the baseline (StutterNet). To tackle data scarcity, we investigate the effectiveness of data augmentation on top of a multi-branched training scheme. The augmented training outperforms the MB StutterNet (clean) by a relative margin of 4.18% in macro F1-score (F1). In addition, we propose a multi-contextual (MC) StutterNet, which exploits different contexts of the stuttered speech, resulting in an overall improvement of 4.48% in F 1 over the single context based MB StutterNet. Finally, we have shown that applying data augmentation in the cross-corpora scenario can improve the overall SD performance by a relative margin of 13.23% in F1 over the clean training. △ Less

Submitted 21 February, 2023; originally announced February 2023.

Comments: Accepted in IEEE Journal of Biomedical Health Informatics 2023

arXiv:2207.10817 [pdf, other]

End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge

Authors: Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

Abstract: In this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit the embeddings from the pre-trained Wav2Vec2.0 model for stuttering detection (SD) on the KSoF dataset. After embedding extraction, we benchmark with several met… ▽ More In this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit the embeddings from the pre-trained Wav2Vec2.0 model for stuttering detection (SD) on the KSoF dataset. After embedding extraction, we benchmark with several methods for SD. Our proposed self-supervised based SD system achieves a UAR of 36.9% and 41.0% on validation and test sets respectively, which is 31.32% (validation set) and 1.49% (test set) higher than the best (DeepSpectrum) challenge baseline (CBL). Moreover, we show that concatenating layer embeddings with Mel-frequency cepstral coefficients (MFCCs) features further improves the UAR of 33.81% and 5.45% on validation and test sets respectively over the CBL. Finally, we demonstrate that the summing information across all the layers of Wav2Vec2.0 surpasses the CBL by a relative margin of 45.91% and 5.69% on validation and test sets respectively. Grand-challenge: Computational Paralinguistics ChallengE △ Less

Submitted 20 July, 2022; originally announced July 2022.

Comments: Accepted in ACM MM 2022 Conference : Grand Challenges, "\c{opyright} {Owner/Author | ACM} {2022}. This is the author's version of the work. It is posted here for your personal use. Not for redistribution

arXiv:2204.01735 [pdf, other]

Robust Stuttering Detection via Multi-task and Adversarial Learning

Authors: Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

Abstract: By automatic detection and identification of stuttering, speech pathologists can track the progression of disfluencies of persons who stutter (PWS). In this paper, we investigate the impact of multi-task (MTL) and adversarial learning (ADV) to learn robust stutter features. This is the first-ever preliminary study where MTL and ADV have been employed in stuttering identification (SI). We evaluate… ▽ More By automatic detection and identification of stuttering, speech pathologists can track the progression of disfluencies of persons who stutter (PWS). In this paper, we investigate the impact of multi-task (MTL) and adversarial learning (ADV) to learn robust stutter features. This is the first-ever preliminary study where MTL and ADV have been employed in stuttering identification (SI). We evaluate our system on the SEP-28k stuttering dataset consisting of 20 hours (approx) of data from 385 podcasts. Our methods show promising results and outperform the baseline in various disfluency classes. We achieve up to 10%, 6.78%, and 2% improvement in repetitions, blocks, and interjections respectively over the baseline. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: Under Review in European Signal Processing Conference 2022

arXiv:2204.01564 [pdf, other]

Introducing ECAPA-TDNN and Wav2Vec2.0 Embeddings to Stuttering Detection

Authors: Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

Abstract: The adoption of advanced deep learning (DL) architecture in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets. To this end, this work introduces the application of speech embeddings extracted with pre-trained deep models trained on massive audio datasets for different tasks. In particular, we explore audio representations obtained using emphasized cha… ▽ More The adoption of advanced deep learning (DL) architecture in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets. To this end, this work introduces the application of speech embeddings extracted with pre-trained deep models trained on massive audio datasets for different tasks. In particular, we explore audio representations obtained using emphasized channel attention, propagation, and aggregation-time-delay neural network (ECAPA-TDNN) and Wav2Vec2.0 model trained on VoxCeleb and LibriSpeech datasets respectively. After extracting the embeddings, we benchmark with several traditional classifiers, such as a k-nearest neighbor, Gaussian naive Bayes, and neural network, for the stuttering detection tasks. In comparison to the standard SD system trained only on the limited SEP-28k dataset, we obtain a relative improvement of 16.74% in terms of overall accuracy over baseline. Finally, we have shown that combining two embeddings and concatenating multiple layers of Wav2Vec2.0 can further improve SD performance up to 1% and 2.64% respectively. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: Submitted to Interspeech 2022

arXiv:2203.14873 [pdf]

doi 10.1016/j.jss.2006.09.010

Institutionalization of Software Product Line: An Empirical Investigation of Key Organizational Factors

Authors: Faheem Ahmed, Luiz Fernando Capretz, Shahbaz Ali Sheikh

Abstract: A good fit between the person and the organization is essential in a better organizational performance. This is even more crucial in case of institutionalization of a software product line practice within an organization. Employees participation, organizational behavior and management contemplation play a vital role in successfully institutionalizing software product lines in a firm. Organizationa… ▽ More A good fit between the person and the organization is essential in a better organizational performance. This is even more crucial in case of institutionalization of a software product line practice within an organization. Employees participation, organizational behavior and management contemplation play a vital role in successfully institutionalizing software product lines in a firm. Organizational dimension has been weighted as one of the critical dimensions in software product line theory and practice. A comprehensive empirical investigation to study the impact of some organizational factors on the performance of software product line practice is presented in this work. This is the first study to empirically investigate and demonstrate the relationships between some of the key organizational factors and software product line performance of an organization. The results of this investigation provide empirical evidence and further support the theoretical foundations that in order to institutionalize software product lines within an organization, organizational factors play an important role. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: 13 pages

Journal ref: Journal of Systems and Software, Volume 80, Issue 6, pp. 836-849, Elsevier, June 2007

arXiv:2107.04057 [pdf, other]

Machine Learning for Stuttering Identification: Review, Challenges and Future Directions

Authors: Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

Abstract: Stuttering is a speech disorder during which the flow of speech is interrupted by involuntary pauses and repetition of sounds. Stuttering identification is an interesting interdisciplinary domain research problem which involves pathology, psychology, acoustics, and signal processing that makes it hard and complicated to detect. Recent developments in machine and deep learning have dramatically rev… ▽ More Stuttering is a speech disorder during which the flow of speech is interrupted by involuntary pauses and repetition of sounds. Stuttering identification is an interesting interdisciplinary domain research problem which involves pathology, psychology, acoustics, and signal processing that makes it hard and complicated to detect. Recent developments in machine and deep learning have dramatically revolutionized speech domain, however minimal attention has been given to stuttering identification. This work fills the gap by trying to bring researchers together from interdisciplinary fields. In this paper, we review comprehensively acoustic features, statistical and deep learning based stuttering/disfluency classification methods. We also present several challenges and possible future directions. △ Less

Submitted 16 November, 2022; v1 submitted 8 July, 2021; originally announced July 2021.

Comments: Accepted in Journal of Neurocomputing 2022 https://doi.org/10.1016/j.neucom.2022.10.015

arXiv:2105.05599 [pdf, other]

StutterNet: Stuttering Detection Using Time Delay Neural Network

Authors: Shakeel A. Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

Abstract: This paper introduces StutterNet, a novel deep learning based stuttering detection capable of detecting and identifying various types of disfluencies. Most of the existing work in this domain uses automatic speech recognition (ASR) combined with language models for stuttering detection. Compared to the existing work, which depends on the ASR module, our method relies solely on the acoustic signal.… ▽ More This paper introduces StutterNet, a novel deep learning based stuttering detection capable of detecting and identifying various types of disfluencies. Most of the existing work in this domain uses automatic speech recognition (ASR) combined with language models for stuttering detection. Compared to the existing work, which depends on the ASR module, our method relies solely on the acoustic signal. We use a time-delay neural network (TDNN) suitable for capturing contextual aspects of the disfluent utterances. We evaluate our system on the UCLASS stuttering dataset consisting of more than 100 speakers. Our method achieves promising results and outperforms the state-of-the-art residual neural network based method. The number of trainable parameters of the proposed method is also substantially less due to the parameter sharing scheme of TDNN. △ Less

Submitted 8 June, 2021; v1 submitted 12 May, 2021; originally announced May 2021.

Comments: Accepted in EUSIPCO 2021: European Signal Processing Conference

Showing 1–10 of 10 results for author: Sheikh, S A