Skip to main content

Showing 1–20 of 20 results for author: Nöth, E

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.03132  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech

    Authors: Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Paula Andrea Pérez-Toro, Maria Schuster, Elmar Noeth, Bjoern Heismann, Andreas Maier, Seung Hee Yang

    Abstract: This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task le… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: to be published in Interspeech 2024 proceedings

  2. arXiv:2406.11025  [pdf, other

    cs.SD cs.CL eess.AS

    Large Language Models for Dysfluency Detection in Stuttered Speech

    Authors: Dominik Wagner, Sebastian P. Bayerl, Ilja Baumann, Korbinian Riedhammer, Elmar Nöth, Tobias Bocklet

    Abstract: Accurately detecting dysfluencies in spoken language can help to improve the performance of automatic speech and language processing components and support the development of more inclusive speech and language technologies. Inspired by the recent trend towards the deployment of large language models (LLMs) as universal learners and processors of non-lexical inputs, such as audio and video, we appr… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  3. arXiv:2404.08064  [pdf

    eess.AS cs.AI cs.CR cs.LG

    The Impact of Speech Anonymization on Pathology and Its Limits

    Authors: Soroosh Tayebi Arasteh, Tomas Arias-Vergara, Paula Andrea Perez-Toro, Tobias Weise, Kai Packhaeuser, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee Yang

    Abstract: Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where priva… ▽ More

    Submitted 22 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  4. arXiv:2308.08306  [pdf, other

    eess.AS cs.SD

    Classifying Dementia in the Presence of Depression: A Cross-Corpus Study

    Authors: Franziska Braun, Sebastian P. Bayerl, Paula A. Pérez-Toro, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer

    Abstract: Automated dementia screening enables early detection and intervention, reducing costs to healthcare systems and increasing quality of life for those affected. Depression has shared symptoms with dementia, adding complexity to diagnoses. The research focus so far has been on binary classification of dementia (DEM) and healthy controls (HC) using speech from picture description tests from a single d… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: Accepted at INTERSPEECH 2023

  5. arXiv:2305.19255  [pdf, other

    eess.AS cs.CL cs.SD

    A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem

    Authors: Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer

    Abstract: Most stuttering detection and classification research has viewed stuttering as a multi-class classification problem or a binary detection task for each dysfluency type; however, this does not match the nature of stuttering, in which one dysfluency seldom comes alone but rather co-occurs with others. This paper explores multi-language and cross-corpus end-to-end stuttering detection as a multi-labe… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted for presentation at Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.15982

  6. Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages

    Authors: Soroosh Tayebi Arasteh, Cristian David Rios-Urrego, Elmar Noeth, Andreas Maier, Seung Hee Yang, Jan Rusz, Juan Rafael Orozco-Arroyave

    Abstract: Parkinson's disease (PD) is a neurological disorder impacting a person's speech. Among automatic PD assessment methods, deep learning models have gained particular interest. Recently, the community has explored cross-pathology and cross-language models which can improve diagnostic accuracy even further. However, strict patient data privacy regulations largely prevent institutions from sharing pati… ▽ More

    Submitted 21 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: INTERSPEECH 2023, pp. 5003--5007, Dublin, Ireland

    Journal ref: INTERSPEECH 2023

  7. arXiv:2210.15982  [pdf, other

    eess.AS cs.SD

    Dysfluencies Seldom Come Alone -- Detection as a Multi-Label Problem

    Authors: Sebastian P. Bayerl, Dominik Wagner, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer

    Abstract: Specially adapted speech recognition models are necessary to handle stuttered speech. For these to be used in a targeted manner, stuttered speech must be reliably detected. Recent works have treated stuttering as a multi-class classification problem or viewed detecting each dysfluency type as an isolated task; that does not capture the nature of stuttering, where one dysfluency seldom comes alone,… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  8. arXiv:2210.15941  [pdf, other

    eess.AS cs.SD

    Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate

    Authors: Ilja Baumann, Dominik Wagner, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet

    Abstract: Recent findings show that pre-trained wav2vec 2.0 models are reliable feature extractors for various speaker characteristics classification tasks. We show that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be used as features for binary classification to distinguish between children with Cleft Lip and Palate (CLP) and a healthy control group. The resu… ▽ More

    Submitted 1 August, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: INTERSPEECH 2023

  9. arXiv:2210.15336  [pdf, ps, other

    eess.AS cs.SD

    Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?

    Authors: Dominik Wagner, Ilja Baumann, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet

    Abstract: The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech. In this work, we train neural networks, large margin classifiers, and tree boosting machines to distinguish between four pathologies: Parkinson's disease, laryngeal cancer, cleft lip and palate, and or… ▽ More

    Submitted 1 August, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: INTERSPEECH 2023

  10. arXiv:2209.08379  [pdf, other

    eess.AS cs.SD q-bio.QM

    Representation Learning Strategies to Model Pathological Speech: Effect of Multiple Spectral Resolutions

    Authors: Gabriel Figueiredo Miller, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth

    Abstract: This paper considers a representation learning strategy to model speech signals from patients with Parkinson's disease and cleft lip and palate. In particular, it compares different parametrized representation types such as wideband and narrowband spectrograms, and wavelet-based scalograms, with the goal of quantifying the representation capacity of each. Methods for quantification include the abi… ▽ More

    Submitted 17 September, 2022; originally announced September 2022.

    Comments: 7 pages, 3 figures

  11. arXiv:2206.03400  [pdf, ps, other

    eess.AS cs.CL cs.SD

    The Influence of Dataset Partitioning on Dysfluency Detection Systems

    Authors: Sebastian P. Bayerl, Dominik Wagner, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer

    Abstract: This paper empirically investigates the influence of different data splits and splitting strategies on the performance of dysfluency detection systems. For this, we perform experiments using wav2vec 2.0 models with a classification head as well as support vector machines (SVM) in conjunction with the features extracted from the wav2vec 2.0 model to detect dysfluencies. We train and evaluate the sy… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Accepted at the 25th International Conference on Text, Speech and Dialogue (TSD 2022)

  12. arXiv:2204.06450  [pdf, other

    cs.SD cs.LG eess.AS

    The effect of speech pathology on automatic speaker verification -- a large-scale study

    Authors: Soroosh Tayebi Arasteh, Tobias Weise, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee Yang

    Abstract: Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data. While public datasets appear to offer solutions, they come with inherent risks of potential unintended exposure of patient health information via re-identification attacks. Using a comprehensive real-world pathological speech corpus, with over n=3,800 test subjects… ▽ More

    Submitted 22 November, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Published in Scientific Reports

    Journal ref: Sci Rep 13, 20476 (2023)

  13. arXiv:2204.04016  [pdf, other

    eess.AS cs.CL cs.LG cs.SD q-bio.QM

    Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment

    Authors: Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Andreas Maier, Elmar Noeth, Bjoern Heismann, Maria Schuster, Seung Hee Yang

    Abstract: Speech intelligibility assessment plays an important role in the therapy of patients suffering from pathological speech disorders. Automatic and objective measures are desirable to assist therapists in their traditionally subjective and labor-intensive assessments. In this work, we investigate a novel approach for obtaining such a measure using the divergence in disentangled latent speech represen… ▽ More

    Submitted 27 June, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted and Accepted at INTERSPEECH2022

  14. Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0

    Authors: Sebastian P. Bayerl, Dominik Wagner, Elmar Nöth, Korbinian Riedhammer

    Abstract: Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech tec… ▽ More

    Submitted 16 June, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted at Interspeech 2022

  15. arXiv:2204.01670  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition

    Authors: Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang

    Abstract: State-of-the-art automatic speech recognition (ASR) systems perform well on healthy speech. However, the performance on impaired speech still remains an issue. The current study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech. Dysarthric speech recognition is particularly difficult as several aspects of sp… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Submitted for review at Interspeech 2022

  16. arXiv:2203.05383  [pdf, other

    eess.AS cs.CL

    KSoF: The Kassel State of Fluency Dataset -- A Therapy Centered Dataset of Stuttering

    Authors: Sebastian P. Bayerl, Alexander Wolff von Gudenberg, Florian Hönig, Elmar Nöth, Korbinian Riedhammer

    Abstract: Stuttering is a complex speech disorder that negatively affects an individual's ability to communicate effectively. Persons who stutter (PWS) often suffer considerably under the condition and seek help through therapy. Fluency sha** is a therapy approach where PWSs learn to modify their speech to help them to overcome their stutter. Mastering such speech techniques takes time and practice, even… ▽ More

    Submitted 16 June, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: Accepted at LREC 2022 Conference on Language Resources and Evaluation

  17. arXiv:2201.05912  [pdf, other

    eess.AS cs.LG cs.SD

    Common Phone: A Multilingual Dataset for Robust Acoustic Modelling

    Authors: Philipp Klumpp, Tomás Arias-Vergara, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave

    Abstract: Current state of the art acoustic models can easily comprise more than 100 million parameters. This growing complexity demands larger training datasets to maintain a decent generalization of the final decision function. An ideal dataset is not necessarily large in size, but large with respect to the amount of unique speakers, utilized hardware and varying recording conditions. This enables a machi… ▽ More

    Submitted 31 January, 2022; v1 submitted 15 January, 2022; originally announced January 2022.

    Comments: Pre-print submitted to LREC 2022 Link to Common Phone: https://zenodo.org/record/5846137

  18. arXiv:2112.11514  [pdf, ps, other

    eess.AS cs.AI cs.LG

    The Phonetic Footprint of Parkinson's Disease

    Authors: Philipp Klumpp, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Juan Rafael Orozco-Arroyave, Anton Batliner, Elmar Nöth

    Abstract: As one of the most prevalent neurodegenerative disorders, Parkinson's disease (PD) has a significant impact on the fine motor skills of patients. The complex interplay of different articulators during speech production and realization of required muscle tension become increasingly difficult, thus leading to a dysarthric speech. Characteristic patterns such as vowel instability, slurred pronunciati… ▽ More

    Submitted 21 December, 2021; originally announced December 2021.

    Comments: https://www.sciencedirect.com/science/article/abs/pii/S0885230821001169

    Journal ref: Elsevier Computer Speech and Language, Volume 72, March 2022

  19. arXiv:2002.05412  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Comparison of user models based on GMM-UBM and i-vectors for speech, handwriting, and gait assessment of Parkinson's disease patients

    Authors: J. C. Vasquez-Correa, T. Bocklet, J. R. Orozco-Arroyave, E. Nöth

    Abstract: Parkinson's disease is a neurodegenerative disorder characterized by the presence of different motor impairments. Information from speech, handwriting, and gait signals have been considered to evaluate the neurological state of the patients. On the other hand, user models based on Gaussian mixture models - universal background models (GMM-UBM) and i-vectors are considered the state-of-the-art in b… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: Proceedings of ICASSP (2019)

  20. arXiv:2002.04374  [pdf, other

    cs.LG cs.CL eess.AS stat.ML

    Convolutional Neural Networks and a Transfer Learning Strategy to Classify Parkinson's Disease from Speech in Three Different Languages

    Authors: J. C. Vásquez-Correa, T. Arias-Vergara, C. D. Rios-Urrego, M. Schuster, J. Rusz, J. R. Orozco-Arroyave, E. Nöth

    Abstract: Parkinson's disease patients develop different speech impairments that affect their communication capabilities. The automatic assessment of the speech of the patients allows the development of computer aided tools to support the diagnosis and the evaluation of the disease severity. This paper introduces a methodology to classify Parkinson's disease from speech in three different languages: Spanish… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

    Journal ref: In Iberoamerican Congress on Pattern Recognition (pp. 697-706) 2019