Skip to main content

Showing 1–14 of 14 results for author: Baumann, I

.
  1. arXiv:2406.11025  [pdf, other

    cs.SD cs.CL eess.AS

    Large Language Models for Dysfluency Detection in Stuttered Speech

    Authors: Dominik Wagner, Sebastian P. Bayerl, Ilja Baumann, Korbinian Riedhammer, Elmar Nöth, Tobias Bocklet

    Abstract: Accurately detecting dysfluencies in spoken language can help to improve the performance of automatic speech and language processing components and support the development of more inclusive speech and language technologies. Inspired by the recent trend towards the deployment of large language models (LLMs) as universal learners and processors of non-lexical inputs, such as audio and video, we appr… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2406.11022  [pdf, other

    cs.SD eess.AS

    Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models

    Authors: Dominik Wagner, Ilja Baumann, Korbinian Riedhammer, Tobias Bocklet

    Abstract: This paper explores the improvement of post-training quantization (PTQ) after knowledge distillation in the Whisper speech foundation model family. We address the challenge of outliers in weights and activation tensors, known to impede quantization quality in transformer-based language and vision models. Extending this observation to Whisper, we demonstrate that these outliers are also present whe… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  3. arXiv:2406.11016  [pdf, other

    cs.LG cs.CL

    Optimized Speculative Sampling for GPU Hardware Accelerators

    Authors: Dominik Wagner, Seanie Lee, Ilja Baumann, Philipp Seeberger, Korbinian Riedhammer, Tobias Bocklet

    Abstract: In this work, we optimize speculative sampling for parallel hardware accelerators to improve sampling speed. We notice that substantial portions of the intermediate matrices necessary for speculative sampling can be computed concurrently. This allows us to distribute the workload across multiple GPU threads, enabling simultaneous operations on matrix segments within thread blocks. Additionally, we… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  4. arXiv:2402.15294  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    A Survey of Music Generation in the Context of Interaction

    Authors: Ismael Agchar, Ilja Baumann, Franziska Braun, Paula Andrea Perez-Toro, Korbinian Riedhammer, Sebastian Trump, Martin Ullrich

    Abstract: In recent years, machine learning, and in particular generative adversarial neural networks (GANs) and attention-based neural networks (transformers), have been successfully used to compose and generate music, both melodies and polyphonic pieces. Current research focuses foremost on style replication (eg. generating a Bach-style chorale) or style transfer (eg. classical to jazz) based on large amo… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  5. arXiv:2306.06514  [pdf, other

    cs.SD eess.AS

    Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks

    Authors: Dominik Wagner, Ilja Baumann, Tobias Bocklet

    Abstract: Cycle-consistent generative adversarial networks have been widely used in non-parallel voice conversion (VC). Their ability to learn map**s between source and target features without relying on parallel training data eliminates the need for temporal alignments. However, most methods decouple the conversion of acoustic features from synthesizing the audio signal by using separate models for conve… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

  6. arXiv:2305.19255  [pdf, other

    eess.AS cs.CL cs.SD

    A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem

    Authors: Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer

    Abstract: Most stuttering detection and classification research has viewed stuttering as a multi-class classification problem or a binary detection task for each dysfluency type; however, this does not match the nature of stuttering, in which one dysfluency seldom comes alone but rather co-occurs with others. This paper explores multi-language and cross-corpus end-to-end stuttering detection as a multi-labe… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted for presentation at Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.15982

  7. arXiv:2211.08774  [pdf, other

    cs.SD eess.AS

    Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

    Authors: Dominik Wagner, Ilja Baumann, Sebastian P. Bayerl, Korbinian Riedhammer, Tobias Bocklet

    Abstract: We analyze the impact of speaker adaptation in end-to-end automatic speech recognition models based on transformers and wav2vec 2.0 under different noise conditions. By including speaker embeddings obtained from x-vector and ECAPA-TDNN systems, as well as i-vectors, we achieve relative word error rate improvements of up to 16.3% on LibriSpeech and up to 14.5% on Switchboard. We show that the prove… ▽ More

    Submitted 7 December, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Accepted at ASRU 2023

  8. arXiv:2210.15941  [pdf, other

    eess.AS cs.SD

    Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate

    Authors: Ilja Baumann, Dominik Wagner, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet

    Abstract: Recent findings show that pre-trained wav2vec 2.0 models are reliable feature extractors for various speaker characteristics classification tasks. We show that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be used as features for binary classification to distinguish between children with Cleft Lip and Palate (CLP) and a healthy control group. The resu… ▽ More

    Submitted 1 August, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: INTERSPEECH 2023

  9. arXiv:2210.15336  [pdf, ps, other

    eess.AS cs.SD

    Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?

    Authors: Dominik Wagner, Ilja Baumann, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet

    Abstract: The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech. In this work, we train neural networks, large margin classifiers, and tree boosting machines to distinguish between four pathologies: Parkinson's disease, laryngeal cancer, cleft lip and palate, and or… ▽ More

    Submitted 1 August, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: INTERSPEECH 2023

  10. arXiv:2206.08058  [pdf, other

    eess.AS cs.CL cs.SD

    Nonwords Pronunciation Classification in Language Development Tests for Preschool Children

    Authors: Ilja Baumann, Dominik Wagner, Sebastian Bayerl, Tobias Bocklet

    Abstract: This work aims to automatically evaluate whether the language development of children is age-appropriate. Validated speech and language tests are used for this purpose to test the auditory memory. In this work, the task is to determine whether spoken nonwords have been uttered correctly. We compare different approaches that are motivated to model specific language structures: Low-level features (F… ▽ More

    Submitted 17 June, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: Accepted at Interspeech 2022

  11. arXiv:2204.03428  [pdf, other

    eess.AS cs.CL cs.SD

    Detecting Vocal Fatigue with Neural Embeddings

    Authors: Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Korbinian Riedhammer, Tobias Bocklet

    Abstract: Vocal fatigue refers to the feeling of tiredness and weakness of voice due to extended utilization. This paper investigates the effectiveness of neural embeddings for the detection of vocal fatigue. We compare x-vectors, ECAPA-TDNN, and wav2vec 2.0 embeddings on a corpus of academic spoken English. Low-dimensional map**s of the data reveal that neural embeddings capture information about the cha… ▽ More

    Submitted 17 January, 2023; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted for Publication in the Journal of Voice

  12. On the size distribution of sunspot groups in the Greenwich sunspot record 1874-1976

    Authors: I. Baumann, S. K. Solanki

    Abstract: We investigate the size distribution of the maximum areas and the instantaneous distribution of areas of sunspot groups using the Greenwich sunspot group record spanning the interval 1874-1976. Both distributions are found to be well described by log-normal functions. Using a simple model we can transform the maximum area distribution into the instantaneous area distribution if the sunspot area… ▽ More

    Submitted 18 October, 2005; originally announced October 2005.

    Comments: accepted by A&A

  13. arXiv:astro-ph/0510322  [pdf, ps, other

    astro-ph

    A necessary extension of the surface flux transport model

    Authors: I. Baumann, D. Schmitt, M. Schuessler

    Abstract: Customary two-dimensional flux transport models for the evolution of the magnetic field at the solar surface do not account for the radial structure and the volume diffusion of the magnetic field. When considering the long-term evolution of magnetic flux, this omission can lead to an unrealistic long-term memory of the system and to the suppression of polar field reversals. In order to avoid suc… ▽ More

    Submitted 11 October, 2005; originally announced October 2005.

    Comments: for further information visit: http://solweb.oma.be/users/baumann/

  14. arXiv:quant-ph/0002086  [pdf, ps, other

    quant-ph physics.atm-clus physics.atom-ph

    Pressure dependence of the Mg $3s4s^3S_1 \to 3s3p^3P_{0,1,2}$ transition in superfluid $^4$He

    Authors: I. Baumann, A. Breidenassel, C. Zühlke, A. Kasimov, G. zu Putlitz, I. Reinhard, K. Jungmann

    Abstract: The pressure shifts of the $3s4s^3S_1 \to 3s3p^3P_{0,1,2}$ transition of magnesium atoms immersed in superfluid helium have been measured at $(1.3\pm0.1 )$K between saturated vapour pressure and $24 $bar. The wavelength is blue shifted linearly by $(0.07\pm0.01) \frac{nm}{bar}$. This value can be satisfactorily described in the framework of the standard bubble model.

    Submitted 28 February, 2000; originally announced February 2000.

    Comments: submitted to EPJD

    Journal ref: Eur. Phys. J. D 12, 117-122 (2000)