Skip to main content

Showing 1–9 of 9 results for author: Mittag, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2303.12761  [pdf, other

    eess.IV cs.LG

    LSTM-based Video Quality Prediction Accounting for Temporal Distortions in Videoconferencing Calls

    Authors: Gabriel Mittag, Babak Naderi, Vishak Gopal, Ross Cutler

    Abstract: Current state-of-the-art video quality models, such as VMAF, give excellent prediction results by comparing the degraded video with its reference video. However, they do not consider temporal distortions (e.g., frame freezes or skips) that occur during videoconferencing calls. In this paper, we present a data-driven approach for modeling such distortions automatically by training an LSTM with subj… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: Accepted at ICASSP 2023

  2. arXiv:2203.16032  [pdf, other

    cs.SD eess.AS

    ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications

    Authors: Gaoxiong Yi, Wei Xiao, Yiming Xiao, Babak Naderi, Sebastian Möller, Wafaa Wardah, Gabriel Mittag, Ross Cutler, Zhuohuang Zhang, Donald S. Williamson, Fei Chen, Fuzheng Yang, Shidong Shang

    Abstract: With the advances in speech communication systems such as online conferencing applications, we can seamlessly work with people regardless of where they are. However, during online meetings, speech quality can be significantly affected by background noise, reverberation, packet loss, network jitter, etc. Because of its nature, speech quality is traditionally assessed in subjective tests in laborato… ▽ More

    Submitted 31 March, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

  3. arXiv:2112.06219  [pdf, other

    cs.SD cs.LG eess.AS

    Visualising and Explaining Deep Learning Models for Speech Quality Prediction

    Authors: H. Tilkorn, G. Mittag, S. Möller

    Abstract: Estimating quality of transmitted speech is known to be a non-trivial task. While traditionally, test participants are asked to rate the quality of samples; nowadays, automated methods are available. These methods can be divided into: 1) intrusive models, which use both, the original and the degraded signals, and 2) non-intrusive models, which only require the degraded signal. Recently, non-intrus… ▽ More

    Submitted 12 December, 2021; originally announced December 2021.

    Comments: 4 pages, 6 figures, In Proceedings of the DAGA 2021 (the annual conference of the German Acoustical Society, DEGA)

    ACM Class: I.2.7

  4. arXiv:2105.00783  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Full-Reference Speech Quality Estimation with Attentional Siamese Neural Networks

    Authors: Gabriel Mittags, Sebastian Möller

    Abstract: In this paper, we present a full-reference speech quality prediction model with a deep learning approach. The model determines a feature representation of the reference and the degraded signal through a siamese recurrent convolutional network that shares the weights for both signals as input. The resulting features are then used to align the signals with an attention mechanism and are finally comb… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

    Comments: Late upload, presented at ICASSP 2020

  5. arXiv:2104.11673  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Deep Learning Based Assessment of Synthetic Speech Naturalness

    Authors: Gabriel Mittag, Sebastian Möller

    Abstract: In this paper, we present a new objective prediction model for synthetic speech naturalness. It can be used to evaluate Text-To-Speech or Voice Conversion systems and works language independently. The model is trained end-to-end and based on a CNN-LSTM network that previously showed to give good results for speech quality estimation. We trained and tested the model on 16 different datasets, such a… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: Late upload, presented at Interspeech 2020

  6. arXiv:2104.10217  [pdf, other

    eess.AS cs.LG cs.SD eess.IV

    Bias-Aware Loss for Training Image and Speech Quality Prediction Models from Multiple Datasets

    Authors: Gabriel Mittag, Saman Zadtootaghaj, Thilo Michael, Babak Naderi, Sebastian Möller

    Abstract: The ground truth used for training image, video, or speech quality prediction models is based on the Mean Opinion Scores (MOS) obtained from subjective experiments. Usually, it is necessary to conduct multiple experiments, mostly with different test participants, to obtain enough data to train quality models based on machine learning. Each of these experiments is subject to an experiment-specific… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

    Comments: Accepted at QoMEX 2021

  7. arXiv:2104.09494  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets

    Authors: Gabriel Mittag, Babak Naderi, Assmaa Chehadi, Sebastian Möller

    Abstract: In this paper, we present an update to the NISQA speech quality prediction model that is focused on distortions that occur in communication networks. In contrast to the previous version, the model is trained end-to-end and the time-dependency modelling and time-pooling is achieved through a Self-Attention mechanism. Besides overall speech quality, the model also predicts the four speech quality di… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: Submitted to Interspeech 2021

  8. arXiv:2010.13260  [pdf, ps, other

    cs.MM cs.SD eess.AS

    Effect of Language Proficiency on Subjective Evaluation of Noise Suppression Algorithms

    Authors: Babak Naderi, Gabriel Mittag, Rafael Zequeira Jim\a'enez, Sebastian Möller

    Abstract: Speech communication systems based on Voice-over-IP technology are frequently used by native as well as non-native speakers of a target language, e.g. in international phone calls or telemeetings. Frequently, such calls also occur in a noisy environment, making noise suppression modules necessary to increase perceived quality of experience. Whereas standard tests for assessing perceived quality ma… ▽ More

    Submitted 25 October, 2020; originally announced October 2020.

  9. arXiv:2007.14598  [pdf, other

    eess.AS cs.SD

    DNN No-Reference PSTN Speech Quality Prediction

    Authors: Gabriel Mittag, Ross Cutler, Yasaman Hosseinkashi, Michael Revow, Sriram Srinivasan, Naglakshmi Chande, Robert Aichner

    Abstract: Classic public switched telephone networks (PSTN) are often a black box for VoIP network providers, as they have no access to performance indicators, such as delay or packet loss. Only the degraded output speech signal can be used to monitor the speech quality of these networks. However, the current state-of-the-art speech quality models are not reliable enough to be used for live monitoring. One… ▽ More

    Submitted 29 July, 2020; originally announced July 2020.