Skip to main content

Showing 1–7 of 7 results for author: Wierstorf, H

.
  1. arXiv:2312.06270  [pdf, other

    eess.AS cs.SD

    Testing Speech Emotion Recognition Machine Learning Models

    Authors: Anna Derington, Hagen Wierstorf, Ali Özkil, Florian Eyben, Felix Burkhardt, Björn W. Schuller

    Abstract: Machine learning models for speech emotion recognition (SER) can be trained for different tasks and are usually evaluated on the basis of a few available datasets per task. Tasks could include arousal, valence, dominance, emotional categories, or tone of voice. Those models are mainly evaluated in terms of correlation or recall, and always show some errors in their predictions. The errors manifest… ▽ More

    Submitted 10 July, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  2. arXiv:2306.16962  [pdf, other

    cs.SD eess.AS

    Speech-based Age and Gender Prediction with Transformers

    Authors: Felix Burkhardt, Johannes Wagner, Hagen Wierstorf, Florian Eyben, Björn Schuller

    Abstract: We report on the curation of several publicly available datasets for age and gender prediction. Furthermore, we present experiments to predict age and gender with models based on a pre-trained wav2vec 2.0. Depending on the dataset, we achieve an MAE between 7.1 years and 10.8 years for age, and at least 91.1% ACC for gender (female, male, child). Compared to a modelling approach built on handcraft… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: 5 pages, submitted to 15th ITG Conference on Speech Communication

  3. arXiv:2303.00645  [pdf, other

    eess.AS cs.SD

    audb -- Sharing and Versioning of Audio and Annotation Data in Python

    Authors: Hagen Wierstorf, Johannes Wagner, Florian Eyben, Felix Burkhardt, Björn W. Schuller

    Abstract: Driven by the need for larger and more diverse datasets to pre-train and fine-tune increasingly complex machine learning models, the number of datasets is rapidly growing. audb is an open-source Python library that supports versioning and documentation of audio datasets. It aims to provide a standardized and simple user-interface to publish, maintain, and access the annotations and audio files of… ▽ More

    Submitted 10 May, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  4. Probing Speech Emotion Recognition Transformers for Linguistic Knowledge

    Authors: Andreas Triantafyllopoulos, Johannes Wagner, Hagen Wierstorf, Maximilian Schmitt, Uwe Reichel, Florian Eyben, Felix Burkhardt, Björn W. Schuller

    Abstract: Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently achieved state-of-the-art results on several speech emotion recognition (SER) datasets. These models are typically pre-trained in self-supervised manner with the goal to improve automatic speech recognition performance -- and thus, to understand linguistic information. In this work, we investigate t… ▽ More

    Submitted 26 July, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: Accepted in INTERSPEECH 2022

    Journal ref: Proc. Interspeech 2022, 146-150

  5. arXiv:2203.07378  [pdf, other

    eess.AS cs.LG cs.SD

    Dawn of the transformer era in speech emotion recognition: closing the valence gap

    Authors: Johannes Wagner, Andreas Triantafyllopoulos, Hagen Wierstorf, Maximilian Schmitt, Felix Burkhardt, Florian Eyben, Björn W. Schuller

    Abstract: Recent advances in transformer-based architectures which are pre-trained in self-supervised manner have shown great promise in several machine learning tasks. In the audio domain, such architectures have also been successfully utilised in the field of speech emotion recognition (SER). However, existing works have not evaluated the influence of model size and pre-training data on downstream perform… ▽ More

    Submitted 7 September, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

    Journal ref: in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10745-10759, 1 Sept. 2023

  6. arXiv:1811.00454  [pdf, ps, other

    cs.SD cs.LG cs.MM eess.AS

    Referenceless Performance Evaluation of Audio Source Separation using Deep Neural Networks

    Authors: Emad M. Grais, Hagen Wierstorf, Dominic Ward, Russell Mason, Mark D. Plumbley

    Abstract: Current performance evaluation for audio source separation depends on comparing the processed or separated signals with reference signals. Therefore, common performance evaluation toolkits are not applicable to real-world situations where the ground truth audio is unavailable. In this paper, we propose a performance evaluation technique that does not require reference signals in order to assess se… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

    MSC Class: 68T01; 68T10; 68T45; 62H25 ACM Class: H.5.5; I.5; I.2.6; I.4.3; I.4; I.2

    Journal ref: This paper will be presented at EUSIPCO 2019

  7. arXiv:1710.11473  [pdf, ps, other

    cs.SD cs.CV cs.LG eess.AS

    Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation

    Authors: Emad M. Grais, Hagen Wierstorf, Dominic Ward, Mark D. Plumbley

    Abstract: In deep neural networks with convolutional layers, each layer typically has fixed-size/single-resolution receptive field (RF). Convolutional layers with a large RF capture global information from the input features, while layers with small RF size capture local details with high resolution from the input features. In this work, we introduce novel deep multi-resolution fully convolutional neural ne… ▽ More

    Submitted 28 October, 2017; originally announced October 2017.

    Comments: arXiv admin note: text overlap with arXiv:1703.08019

    MSC Class: 68T01 ACM Class: H.5.5; I.5; I.2.6; I.4.3; I.4; I.2