Skip to main content

Showing 1–6 of 6 results for author: Strake, M

.
  1. arXiv:2306.02778  [pdf, other

    eess.AS

    EffCRN: An Efficient Convolutional Recurrent Network for High-Performance Speech Enhancement

    Authors: Marvin Sach, Jan Franzen, Bruno Defraene, Kristoff Fluyt, Maximilian Strake, Wouter Tirry, Tim Fingscheidt

    Abstract: Fully convolutional recurrent neural networks (FCRNs) have shown state-of-the-art performance in single-channel speech enhancement. However, the number of parameters and the FLOPs/second of the original FCRN are restrictively high. A further important class of efficient networks is the CRUSE topology, serving as reference in our work. By applying a number of topological changes at once, we propose… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: 5 pages, 5 figures, accepted for Interspeech 2023

  2. arXiv:2205.02085  [pdf, other

    eess.AS cs.SD

    Does a PESQNet (Loss) Require a Clean Reference Input? The Original PESQ Does, But ACR Listening Tests Don't

    Authors: Ziyi Xu, Maximilian Strake, Tim Fingscheidt

    Abstract: Perceptual evaluation of speech quality (PESQ) requires a clean speech reference as input, but predicts the results from (reference-free) absolute category rating (ACR) tests. In this work, we train a fully convolutional recurrent neural network (FCRN) as deep noise suppression (DNS) model, with either a non-intrusive or an intrusive PESQNet, where only the latter has access to a clean speech refe… ▽ More

    Submitted 13 May, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

  3. arXiv:2111.03847  [pdf, other

    eess.AS cs.SD

    Deep Noise Suppression Maximizing Non-Differentiable PESQ Mediated by a Non-Intrusive PESQNet

    Authors: Ziyi Xu, Maximilian Strake, Tim Fingscheidt

    Abstract: Speech enhancement employing deep neural networks (DNNs) for denoising are called deep noise suppression (DNS). During training, DNS methods are typically trained with mean squared error (MSE) type loss functions, which do not guarantee good perceptual quality. Perceptual evaluation of speech quality (PESQ) is a widely used metric for evaluating speech quality. However, the original PESQ algorithm… ▽ More

    Submitted 6 November, 2021; originally announced November 2021.

  4. arXiv:2103.17189  [pdf, ps, other

    eess.AS cs.SD

    Y$^2$-Net FCRN for Acoustic Echo and Noise Suppression

    Authors: Ernst Seidel, Jan Franzen, Maximilian Strake, Tim Fingscheidt

    Abstract: In recent years, deep neural networks (DNNs) were studied as an alternative to traditional acoustic echo cancellation (AEC) algorithms. The proposed models achieved remarkable performance for the separate tasks of AEC and residual echo suppression (RES). A promising network topology is a fully convolutional recurrent network (FCRN) structure, which has already proven its performance on both noise… ▽ More

    Submitted 18 July, 2021; v1 submitted 31 March, 2021; originally announced March 2021.

    Comments: 5 pages, 2 figures, accepted for Interspeech 2021

  5. arXiv:2103.17088  [pdf, ps, other

    eess.AS

    Deep Noise Suppression With Non-Intrusive PESQNet Supervision Enabling the Use of Real Training Data

    Authors: Ziyi Xu, Maximilian Strake, Tim Fingscheidt

    Abstract: Data-driven speech enhancement employing deep neural networks (DNNs) can provide state-of-the-art performance even in the presence of non-stationary noise. During the training process, most of the speech enhancement neural networks are trained in a fully supervised way with losses requiring noisy speech to be synthesized by clean speech and additive noise. However, in a real implementation, only t… ▽ More

    Submitted 31 March, 2021; originally announced March 2021.

  6. arXiv:1810.11217  [pdf, ps, other

    eess.AS

    Concatenated Identical DNN (CI-DNN) to Reduce Noise-Type Dependence in DNN-Based Speech Enhancement

    Authors: Ziyi Xu, Maximilian Strake, Tim Fingscheidt

    Abstract: Estimating time-frequency domain masks for speech enhancement using deep learning approaches has recently become a popular field of research. In this paper, we propose a mask-based speech enhancement framework by using concatenated identical deep neural networks (CI-DNNs). The idea is that a single DNN is trained under multiple input and output signal-to-noise power ratio (SNR) conditions, using t… ▽ More

    Submitted 26 October, 2018; originally announced October 2018.