Skip to main content

Showing 1–19 of 19 results for author: Wagner, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.11025  [pdf, other

    cs.SD cs.CL eess.AS

    Large Language Models for Dysfluency Detection in Stuttered Speech

    Authors: Dominik Wagner, Sebastian P. Bayerl, Ilja Baumann, Korbinian Riedhammer, Elmar Nöth, Tobias Bocklet

    Abstract: Accurately detecting dysfluencies in spoken language can help to improve the performance of automatic speech and language processing components and support the development of more inclusive speech and language technologies. Inspired by the recent trend towards the deployment of large language models (LLMs) as universal learners and processors of non-lexical inputs, such as audio and video, we appr… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2406.11022  [pdf, other

    cs.SD eess.AS

    Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models

    Authors: Dominik Wagner, Ilja Baumann, Korbinian Riedhammer, Tobias Bocklet

    Abstract: This paper explores the improvement of post-training quantization (PTQ) after knowledge distillation in the Whisper speech foundation model family. We address the challenge of outliers in weights and activation tensors, known to impede quantization quality in transformer-based language and vision models. Extending this observation to Whisper, we demonstrate that these outliers are also present whe… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  3. A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

    Authors: Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi

    Abstract: Interactions with virtual assistants typically start with a predefined trigger phrase followed by the user command. To make interactions with the assistant more intuitive, we explore whether it is feasible to drop the requirement that users must begin each command with a trigger phrase. We explore this task in three ways: First, we train classifiers using only acoustic information obtained from th… ▽ More

    Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2312.03632

  4. arXiv:2312.03632  [pdf, other

    cs.SD cs.LG eess.AS

    Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models

    Authors: Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi

    Abstract: Interactions with virtual assistants typically start with a trigger phrase followed by a command. In this work, we explore the possibility of making these interactions more natural by eliminating the need for a trigger phrase. Our goal is to determine whether a user addressed the virtual assistant based on signals obtained from the streaming audio recorded by the device microphone. We address this… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  5. arXiv:2306.06514  [pdf, other

    cs.SD eess.AS

    Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks

    Authors: Dominik Wagner, Ilja Baumann, Tobias Bocklet

    Abstract: Cycle-consistent generative adversarial networks have been widely used in non-parallel voice conversion (VC). Their ability to learn map**s between source and target features without relying on parallel training data eliminates the need for temporal alignments. However, most methods decouple the conversion of acoustic features from synthesizing the audio signal by using separate models for conve… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

  6. arXiv:2305.19255  [pdf, other

    eess.AS cs.CL cs.SD

    A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem

    Authors: Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer

    Abstract: Most stuttering detection and classification research has viewed stuttering as a multi-class classification problem or a binary detection task for each dysfluency type; however, this does not match the nature of stuttering, in which one dysfluency seldom comes alone but rather co-occurs with others. This paper explores multi-language and cross-corpus end-to-end stuttering detection as a multi-labe… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted for presentation at Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.15982

  7. Generative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech

    Authors: Dominik Wagner, Sebastian P. Bayerl, Hector A. Cordourier Maruri, Tobias Bocklet

    Abstract: This work adapts two recent architectures of generative models and evaluates their effectiveness for the conversion of whispered speech to normal speech. We incorporate the normal target speech into the training criterion of vector-quantized variational autoencoders (VQ-VAEs) and MelGANs, thereby conditioning the systems to recover voiced speech from whispered inputs. Objective and subjective qual… ▽ More

    Submitted 30 January, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

    Comments: Accepted at SLT 2022

  8. arXiv:2211.08774  [pdf, other

    cs.SD eess.AS

    Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

    Authors: Dominik Wagner, Ilja Baumann, Sebastian P. Bayerl, Korbinian Riedhammer, Tobias Bocklet

    Abstract: We analyze the impact of speaker adaptation in end-to-end automatic speech recognition models based on transformers and wav2vec 2.0 under different noise conditions. By including speaker embeddings obtained from x-vector and ECAPA-TDNN systems, as well as i-vectors, we achieve relative word error rate improvements of up to 16.3% on LibriSpeech and up to 14.5% on Switchboard. We show that the prove… ▽ More

    Submitted 7 December, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Accepted at ASRU 2023

  9. arXiv:2210.15982  [pdf, other

    eess.AS cs.SD

    Dysfluencies Seldom Come Alone -- Detection as a Multi-Label Problem

    Authors: Sebastian P. Bayerl, Dominik Wagner, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer

    Abstract: Specially adapted speech recognition models are necessary to handle stuttered speech. For these to be used in a targeted manner, stuttered speech must be reliably detected. Recent works have treated stuttering as a multi-class classification problem or viewed detecting each dysfluency type as an isolated task; that does not capture the nature of stuttering, where one dysfluency seldom comes alone,… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  10. arXiv:2210.15941  [pdf, other

    eess.AS cs.SD

    Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate

    Authors: Ilja Baumann, Dominik Wagner, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet

    Abstract: Recent findings show that pre-trained wav2vec 2.0 models are reliable feature extractors for various speaker characteristics classification tasks. We show that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be used as features for binary classification to distinguish between children with Cleft Lip and Palate (CLP) and a healthy control group. The resu… ▽ More

    Submitted 1 August, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: INTERSPEECH 2023

  11. arXiv:2210.15336  [pdf, ps, other

    eess.AS cs.SD

    Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?

    Authors: Dominik Wagner, Ilja Baumann, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet

    Abstract: The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech. In this work, we train neural networks, large margin classifiers, and tree boosting machines to distinguish between four pathologies: Parkinson's disease, laryngeal cancer, cleft lip and palate, and or… ▽ More

    Submitted 1 August, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: INTERSPEECH 2023

  12. arXiv:2206.08058  [pdf, other

    eess.AS cs.CL cs.SD

    Nonwords Pronunciation Classification in Language Development Tests for Preschool Children

    Authors: Ilja Baumann, Dominik Wagner, Sebastian Bayerl, Tobias Bocklet

    Abstract: This work aims to automatically evaluate whether the language development of children is age-appropriate. Validated speech and language tests are used for this purpose to test the auditory memory. In this work, the task is to determine whether spoken nonwords have been uttered correctly. We compare different approaches that are motivated to model specific language structures: Low-level features (F… ▽ More

    Submitted 17 June, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: Accepted at Interspeech 2022

  13. arXiv:2206.03400  [pdf, ps, other

    eess.AS cs.CL cs.SD

    The Influence of Dataset Partitioning on Dysfluency Detection Systems

    Authors: Sebastian P. Bayerl, Dominik Wagner, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer

    Abstract: This paper empirically investigates the influence of different data splits and splitting strategies on the performance of dysfluency detection systems. For this, we perform experiments using wav2vec 2.0 models with a classification head as well as support vector machines (SVM) in conjunction with the features extracted from the wav2vec 2.0 model to detect dysfluencies. We train and evaluate the sy… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Accepted at the 25th International Conference on Text, Speech and Dialogue (TSD 2022)

  14. arXiv:2204.03428  [pdf, other

    eess.AS cs.CL cs.SD

    Detecting Vocal Fatigue with Neural Embeddings

    Authors: Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Korbinian Riedhammer, Tobias Bocklet

    Abstract: Vocal fatigue refers to the feeling of tiredness and weakness of voice due to extended utilization. This paper investigates the effectiveness of neural embeddings for the detection of vocal fatigue. We compare x-vectors, ECAPA-TDNN, and wav2vec 2.0 embeddings on a corpus of academic spoken English. Low-dimensional map**s of the data reveal that neural embeddings capture information about the cha… ▽ More

    Submitted 17 January, 2023; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted for Publication in the Journal of Voice

  15. Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0

    Authors: Sebastian P. Bayerl, Dominik Wagner, Elmar Nöth, Korbinian Riedhammer

    Abstract: Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech tec… ▽ More

    Submitted 16 June, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted at Interspeech 2022

  16. arXiv:2104.10752  [pdf, ps, other

    eess.SY

    A Concise Guide on the Integration of Battery Electric Buses into Urban Bus Networks

    Authors: Nicolas Dirks, Dennis Wagner, Maximilian Schiffer, Grit Walther

    Abstract: With the increasing market penetration of battery-electric buses into urban bus networks, practitioners face many novel planning problems. As a result, the interest in optimization-based decision-making for these planning problems increases but practitioners' requirements on planning solutions and current academic approaches often diverge. Against this background, this survey aims to provide a con… ▽ More

    Submitted 21 April, 2021; originally announced April 2021.

    Comments: 24 pages, 8 figures, 11 tables

  17. arXiv:2005.09313  [pdf, ps, other

    eess.SY math.OC

    Measures and LMIs for Adaptive Control Validation

    Authors: Daniel Wagner, Didier Henrion, Martin Hrom{č}ík

    Abstract: Occupation measures and linear matrix inequality (LMI) relax-ations (called the moment sums of squares or Lasserre hierarchy) have been used previously as a means for solving control law verification and validation (VV) problems. However, these methods have been restricted to relatively simple control laws and a limited number of states. In this document, we extend these methods to model reference… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

  18. arXiv:2003.11292  [pdf, ps, other

    eess.SY math.OC

    Measures and LMIs for Lateral F-16 MRAC Validation

    Authors: Daniel Wagner, Didier Henrion, Martin Hromčík

    Abstract: Occupation measures and linear matrix inequality (LMI) relax-ations (called the moment sums of squares or Lasserre hierarchy) are state-of-the-art methods for verification and validation (VV) in aerospace. In this document, we extend these results to a full F-16 closed-loop nonlinear dutch roll polynomial model complete with model reference adaptive control (MRAC). This is done through a new techn… ▽ More

    Submitted 25 March, 2020; originally announced March 2020.

  19. arXiv:1505.05747  [pdf, other

    eess.SY

    Operating Power Grids with Few Flow Control Buses

    Authors: Thomas Leibfried, Tamara Mchedlidze, Nico Meyer-Hübner, Martin Nöllenburg, Ignaz Rutter, Peter Sanders, Dorothea Wagner, Franziska Wegner

    Abstract: Future power grids will offer enhanced controllability due to the increased availability of power flow control units (FACTS). As the installation of control units in the grid is an expensive investment, we are interested in using few controllers to achieve high controllability. In particular, two questions arise: How many flow control buses are necessary to obtain globally optimal power flows? And… ▽ More

    Submitted 21 May, 2015; originally announced May 2015.

    Comments: extended version of an ACM e-Energy 2015 poster/workshop paper