Skip to main content

Showing 1–25 of 25 results for author: Bayerl, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11025  [pdf, other

    cs.SD cs.CL eess.AS

    Large Language Models for Dysfluency Detection in Stuttered Speech

    Authors: Dominik Wagner, Sebastian P. Bayerl, Ilja Baumann, Korbinian Riedhammer, Elmar Nöth, Tobias Bocklet

    Abstract: Accurately detecting dysfluencies in spoken language can help to improve the performance of automatic speech and language processing components and support the development of more inclusive speech and language technologies. Inspired by the recent trend towards the deployment of large language models (LLMs) as universal learners and processors of non-lexical inputs, such as audio and video, we appr… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2405.19970  [pdf, other

    cs.AI

    Strategies to Counter Artificial Intelligence in Law Enforcement: Cross-Country Comparison of Citizens in Greece, Italy and Spain

    Authors: Petra Saskia Bayerl, Babak Akhgar, Ernesto La Mattina, Barbara Pirillo, Ioana Cotoi, Davide Ariu, Matteo Mauri, Jorge Garcia, Dimitris Kavallieros, Antonia Kardara, Konstantina Karagiorgou

    Abstract: This paper investigates citizens' counter-strategies to the use of Artificial Intelligence (AI) by law enforcement agencies (LEAs). Based on information from three countries (Greece, Italy and Spain) we demonstrate disparities in the likelihood of ten specific counter-strategies. We further identified factors that increase the propensity for counter-strategies. Our study provides an important new… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 20th International Conference on Information and Knowledge Engineering (IKE'21), 3 papges, 1 figure

    ACM Class: I.2.0; K.4.1

  3. arXiv:2308.08306  [pdf, other

    eess.AS cs.SD

    Classifying Dementia in the Presence of Depression: A Cross-Corpus Study

    Authors: Franziska Braun, Sebastian P. Bayerl, Paula A. Pérez-Toro, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer

    Abstract: Automated dementia screening enables early detection and intervention, reducing costs to healthcare systems and increasing quality of life for those affected. Depression has shared symptoms with dementia, adding complexity to diagnoses. The research focus so far has been on binary classification of dementia (DEM) and healthy controls (HC) using speech from picture description tests from a single d… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: Accepted at INTERSPEECH 2023

  4. arXiv:2306.01786  [pdf

    cs.CY

    Citizen Perspectives on Necessary Safeguards to the Use of AI by Law Enforcement Agencies

    Authors: Yasmine Ezzeddine, Petra Saskia Bayerl, Helen Gibson

    Abstract: In the light of modern technological advances, Artificial Intelligence (AI) is relied upon to enhance performance, increase efficiency, and maximize gains. For Law Enforcement Agencies (LEAs), it can prove valuable in optimizing evidence analysis and establishing proactive prevention measures. Nevertheless, citizens raise legitimate concerns around privacy invasions, biases, inequalities, and inac… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: CSCE 2022 Conference Proceedings

    Journal ref: Springer Nature - Book Series 2023: Transactions on Computational Science & Computational Intelligence Springer Nature - Book Series: Transactions on Computational Science & Computational Intelligence

  5. arXiv:2305.19255  [pdf, other

    eess.AS cs.CL cs.SD

    A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem

    Authors: Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer

    Abstract: Most stuttering detection and classification research has viewed stuttering as a multi-class classification problem or a binary detection task for each dysfluency type; however, this does not match the nature of stuttering, in which one dysfluency seldom comes alone but rather co-occurs with others. This paper explores multi-language and cross-corpus end-to-end stuttering detection as a multi-labe… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted for presentation at Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.15982

  6. Generative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech

    Authors: Dominik Wagner, Sebastian P. Bayerl, Hector A. Cordourier Maruri, Tobias Bocklet

    Abstract: This work adapts two recent architectures of generative models and evaluates their effectiveness for the conversion of whispered speech to normal speech. We incorporate the normal target speech into the training criterion of vector-quantized variational autoencoders (VQ-VAEs) and MelGANs, thereby conditioning the systems to recover voiced speech from whispered inputs. Objective and subjective qual… ▽ More

    Submitted 30 January, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

    Comments: Accepted at SLT 2022

  7. arXiv:2211.08774  [pdf, other

    cs.SD eess.AS

    Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

    Authors: Dominik Wagner, Ilja Baumann, Sebastian P. Bayerl, Korbinian Riedhammer, Tobias Bocklet

    Abstract: We analyze the impact of speaker adaptation in end-to-end automatic speech recognition models based on transformers and wav2vec 2.0 under different noise conditions. By including speaker embeddings obtained from x-vector and ECAPA-TDNN systems, as well as i-vectors, we achieve relative word error rate improvements of up to 16.3% on LibriSpeech and up to 14.5% on Switchboard. We show that the prove… ▽ More

    Submitted 7 December, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Accepted at ASRU 2023

  8. arXiv:2210.15982  [pdf, other

    eess.AS cs.SD

    Dysfluencies Seldom Come Alone -- Detection as a Multi-Label Problem

    Authors: Sebastian P. Bayerl, Dominik Wagner, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer

    Abstract: Specially adapted speech recognition models are necessary to handle stuttered speech. For these to be used in a targeted manner, stuttered speech must be reliably detected. Recent works have treated stuttering as a multi-class classification problem or viewed detecting each dysfluency type as an isolated task; that does not capture the nature of stuttering, where one dysfluency seldom comes alone,… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  9. arXiv:2210.15941  [pdf, other

    eess.AS cs.SD

    Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate

    Authors: Ilja Baumann, Dominik Wagner, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet

    Abstract: Recent findings show that pre-trained wav2vec 2.0 models are reliable feature extractors for various speaker characteristics classification tasks. We show that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be used as features for binary classification to distinguish between children with Cleft Lip and Palate (CLP) and a healthy control group. The resu… ▽ More

    Submitted 1 August, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: INTERSPEECH 2023

  10. arXiv:2210.15336  [pdf, ps, other

    eess.AS cs.SD

    Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?

    Authors: Dominik Wagner, Ilja Baumann, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet

    Abstract: The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech. In this work, we train neural networks, large margin classifiers, and tree boosting machines to distinguish between four pathologies: Parkinson's disease, laryngeal cancer, cleft lip and palate, and or… ▽ More

    Submitted 1 August, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: INTERSPEECH 2023

  11. An Acoustical Machine Learning Approach to Determine Abrasive Belt Wear of Wide Belt Sanders

    Authors: Maximilian Bundscherer, Thomas H. Schmitt, Sebastian Bayerl, Thomas Auerbach, Tobias Bocklet

    Abstract: This paper describes a machine learning approach to determine the abrasive belt wear of wide belt sanders used in industrial processes based on acoustic data, regardless of the sanding process-related parameters, Feed speed, Grit Size, and Type of material. Our approach utilizes Decision Tree, Random Forest, k-nearest Neighbors, and Neural network Classifiers to detect the belt wear from Spectrogr… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: Conference IEEE SENSORS 2022

  12. arXiv:2206.08835  [pdf, other

    cs.CL cs.SD eess.AS

    What can Speech and Language Tell us About the Working Alliance in Psychotherapy

    Authors: Sebastian P. Bayerl, Gabriel Roccabruna, Shammur Absar Chowdhury, Tommaso Ciulli, Morena Danieli, Korbinian Riedhammer, Giuseppe Riccardi

    Abstract: We are interested in the problem of conversational analysis and its application to the health domain. Cognitive Behavioral Therapy is a structured approach in psychotherapy, allowing the therapist to help the patient to identify and modify the malicious thoughts, behavior, or actions. This cooperative effort can be evaluated using the Working Alliance Inventory Observer-rated Shortened - a 12 item… ▽ More

    Submitted 27 June, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted at Interspeech 2022

  13. arXiv:2206.08058  [pdf, other

    eess.AS cs.CL cs.SD

    Nonwords Pronunciation Classification in Language Development Tests for Preschool Children

    Authors: Ilja Baumann, Dominik Wagner, Sebastian Bayerl, Tobias Bocklet

    Abstract: This work aims to automatically evaluate whether the language development of children is age-appropriate. Validated speech and language tests are used for this purpose to test the auditory memory. In this work, the task is to determine whether spoken nonwords have been uttered correctly. We compare different approaches that are motivated to model specific language structures: Low-level features (F… ▽ More

    Submitted 17 June, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: Accepted at Interspeech 2022

  14. arXiv:2206.05018  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Going Beyond the Cookie Theft Picture Test: Detecting Cognitive Impairments using Acoustic Features

    Authors: Franziska Braun, Andreas Erzigkeit, Hartmut Lehfeld, Thomas Hillemacher, Korbinian Riedhammer, Sebastian P. Bayerl

    Abstract: Standardized tests play a crucial role in the detection of cognitive impairment. Previous work demonstrated that automatic detection of cognitive impairment is possible using audio data from a standardized picture description task. The presented study goes beyond that, evaluating our methods on data taken from two standardized neuropsychological tests, namely the German SKT and a German version of… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: Accepted at the 25th International Conference on Text, Speech and Dialogue (TSD 2022)

  15. arXiv:2206.03400  [pdf, ps, other

    eess.AS cs.CL cs.SD

    The Influence of Dataset Partitioning on Dysfluency Detection Systems

    Authors: Sebastian P. Bayerl, Dominik Wagner, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer

    Abstract: This paper empirically investigates the influence of different data splits and splitting strategies on the performance of dysfluency detection systems. For this, we perform experiments using wav2vec 2.0 models with a classification head as well as support vector machines (SVM) in conjunction with the features extracted from the wav2vec 2.0 model to detect dysfluencies. We train and evaluate the sy… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Accepted at the 25th International Conference on Text, Speech and Dialogue (TSD 2022)

  16. arXiv:2205.06799  [pdf, other

    cs.SD cs.LG eess.AS

    The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes

    Authors: Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Christian Bergler, Maurice Gerczuk, Natalie Holz, Pauline Larrouy-Maestri, Sebastian P. Bayerl, Korbinian Riedhammer, Adria Mallol-Ragolta, Maria Pateraki, Harry Coppock, Ivan Kiskin, Marianne Sinka, Stephen Roberts

    Abstract: The ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Vocalisations and Stuttering Sub-Challenges, a classification on human non-verbal vocalisations and speech has to be made; the Activity Sub-Challenge aims at beyond-audio human activity recognition from smartwatch senso… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: 5 pages, part of the ACM Multimedia 2022 Grand Challenge "The ACM Multimedia 2022 Computational Paralinguistics Challenge (ComParE 2022)"

    MSC Class: 68 ACM Class: I.2.7; I.5.0; J.3

  17. arXiv:2204.03428  [pdf, other

    eess.AS cs.CL cs.SD

    Detecting Vocal Fatigue with Neural Embeddings

    Authors: Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Korbinian Riedhammer, Tobias Bocklet

    Abstract: Vocal fatigue refers to the feeling of tiredness and weakness of voice due to extended utilization. This paper investigates the effectiveness of neural embeddings for the detection of vocal fatigue. We compare x-vectors, ECAPA-TDNN, and wav2vec 2.0 embeddings on a corpus of academic spoken English. Low-dimensional map**s of the data reveal that neural embeddings capture information about the cha… ▽ More

    Submitted 17 January, 2023; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted for Publication in the Journal of Voice

  18. Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0

    Authors: Sebastian P. Bayerl, Dominik Wagner, Elmar Nöth, Korbinian Riedhammer

    Abstract: Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech tec… ▽ More

    Submitted 16 June, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted at Interspeech 2022

  19. arXiv:2203.05383  [pdf, other

    eess.AS cs.CL

    KSoF: The Kassel State of Fluency Dataset -- A Therapy Centered Dataset of Stuttering

    Authors: Sebastian P. Bayerl, Alexander Wolff von Gudenberg, Florian Hönig, Elmar Nöth, Korbinian Riedhammer

    Abstract: Stuttering is a complex speech disorder that negatively affects an individual's ability to communicate effectively. Persons who stutter (PWS) often suffer considerably under the condition and seek help through therapy. Fluency sha** is a therapy approach where PWSs learn to modify their speech to help them to overcome their stutter. Mastering such speech techniques takes time and practice, even… ▽ More

    Submitted 16 June, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: Accepted at LREC 2022 Conference on Language Resources and Evaluation

  20. arXiv:2112.06603  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Detecting Emotion Carriers by Combining Acoustic and Lexical Representations

    Authors: Sebastian P. Bayerl, Aniruddha Tammewar, Korbinian Riedhammer, Giuseppe Riccardi

    Abstract: Personal narratives (PN) - spoken or written - are recollections of facts, people, events, and thoughts from one's own experience. Emotion recognition and sentiment analysis tasks are usually defined at the utterance or document level. However, in this work, we focus on Emotion Carriers (EC) defined as the segments (speech or text) that best explain the emotional state of the narrator ("loss of fa… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: Accepted at ASRU 2021 https://asru2021.org/

  21. arXiv:2106.09545  [pdf, other

    eess.AS cs.CL cs.SD

    STAN: A stuttering therapy analysis helper

    Authors: Sebastian P. Bayerl, Marc Wenninger, Jochen Schmidt, Alexander Wolff von Gudenberg, Korbinian Riedhammer

    Abstract: Stuttering is a complex speech disorder identified by repeti-tions, prolongations of sounds, syllables or words and blockswhile speaking. Specific stuttering behaviour differs strongly,thus needing personalized therapy. Therapy sessions requirea high level of concentration by the therapist. We introduceSTAN, a system to aid speech therapists in stuttering therapysessions. Such an automated feedbac… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Report number: https://rc.signalprocessingsociety.org/workshops/slt-2021/SLT21VID155.html?source=IBP

    Journal ref: Demo presented at 2021 IEEE Spoken Language Technology Workshop (SLT)

  22. Offline Model Guard: Secure and Private ML on Mobile Devices

    Authors: Sebastian P. Bayerl, Tommaso Frassetto, Patrick Jauernig, Korbinian Riedhammer, Ahmad-Reza Sadeghi, Thomas Schneider, Emmanuel Stapf, Christian Weinert

    Abstract: Performing machine learning tasks in mobile applications yields a challenging conflict of interest: highly sensitive client information (e.g., speech data) should remain private while also the intellectual property of service providers (e.g., model parameters) must be protected. Cryptographic techniques offer secure solutions for this, but have an unacceptable overhead and moreover require frequen… ▽ More

    Submitted 5 July, 2020; originally announced July 2020.

    Comments: Original Publication (in the same form): DATE 2020

    Journal ref: DATE 2020, pages 460-465

  23. arXiv:2006.09222  [pdf, other

    q-bio.QM cs.CL cs.LG cs.SD eess.AS

    Towards Automated Assessment of Stuttering and Stuttering Therapy

    Authors: Sebastian P. Bayerl, Florian Hönig, Joelle Reister, Korbinian Riedhammer

    Abstract: Stuttering is a complex speech disorder that can be identified by repetitions, prolongations of sounds, syllables or words, and blocks while speaking. Severity assessment is usually done by a speech therapist. While attempts at automated assessment were made, it is rarely used in therapy. Common methods for the assessment of stuttering severity include percent stuttered syllables (% SS), the avera… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Comments: 10 pages, 3 figures, 1 table Accepted at TSD 2020, 23rd International Conference on Text, Speech and Dialogue

  24. arXiv:1909.12232  [pdf, other

    cs.CL cs.LG cs.SD eess.AS stat.ML

    A Comparison of Hybrid and End-to-End Models for Syllable Recognition

    Authors: Sebastian P. Bayerl, Korbinian Riedhammer

    Abstract: This paper presents a comparison of a traditional hybrid speech recognition system (kaldi using WFST and TDNN with lattice-free MMI) and a lexicon-free end-to-end (TensorFlow implementation of multi-layer LSTM with CTC training) models for German syllable recognition on the Verbmobil corpus. The results show that explicitly modeling prior knowledge is still valuable in building recognition systems… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: 22th International Conference of Text, Speech and Dialogue TSD2019

  25. Timage -- A Robust Time Series Classification Pipeline

    Authors: Marc Wenninger, Sebastian P. Bayerl, Jochen Schmidt, Korbinian Riedhammer

    Abstract: Time series are series of values ordered by time. This kind of data can be found in many real world settings. Classifying time series is a difficult task and an active area of research. This paper investigates the use of transfer learning in Deep Neural Networks and a 2D representation of time series known as Recurrence Plots. In order to utilize the research done in the area of image classificati… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: ICANN19, 28th International Conference on Artificial Neural Networks