Skip to main content

Showing 1–25 of 25 results for author: Fingscheidt, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.04660  [pdf, other

    eess.AS cs.SD

    URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement

    Authors: Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian

    Abstract: The last decade has witnessed significant advancements in deep learning-based speech enhancement (SE). However, most existing SE research has limitations on the coverage of SE sub-tasks, data diversity and amount, and evaluation metrics. To fill this gap and promote research toward universal SE, we establish a new SE challenge, named URGENT, to focus on the universality, robustness, and generaliza… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 6 pages, 3 figures, 3 tables. Accepted by Interspeech 2024. An extended version of the accepted manuscript with appendix

  2. arXiv:2404.11621  [pdf, ps, other

    eess.AS

    Efficient High-Performance Bark-Scale Neural Network for Residual Echo and Noise Suppression

    Authors: Ernst Seidel, Pejman Mowlaee, Tim Fingscheidt

    Abstract: In recent years, the introduction of neural networks (NNs) into the field of speech enhancement has brought significant improvements. However, many of the proposed methods are quite demanding in terms of computational complexity and memory footprint. For the application in dedicated communication devices, such as speakerphones, hands-free car systems, or smartphones, efficiency plays a major role… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: accepted to ICASSP 2024; 5 pages, 3 figures

  3. arXiv:2309.02432  [pdf, other

    eess.AS cs.SD

    Employing Real Training Data for Deep Noise Suppression

    Authors: Ziyi Xu, Marvin Sach, Jan Pirklbauer, Tim Fingscheidt

    Abstract: Most deep noise suppression (DNS) models are trained with reference-based losses requiring access to clean speech. However, sometimes an additive microphone model is insufficient for real-world applications. Accordingly, ways to use real training data in supervised learning for DNS models promise to reduce a potential training/inference mismatch. Employing real data for DNS training requires eithe… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  4. arXiv:2307.15630  [pdf, ps, other

    eess.AS

    Efficient Acoustic Echo Suppression with Condition-Aware Training

    Authors: Ernst Seidel, Pejman Mowlaee, Tim Fingscheidt

    Abstract: The topic of deep acoustic echo control (DAEC) has seen many approaches with various model topologies in recent years. Convolutional recurrent networks (CRNs), consisting of a convolutional encoder and decoder encompassing a recurrent bottleneck, are repeatedly employed due to their ability to preserve nearend speech even in double-talk (DT) condition. However, past architectures are either comput… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: 5 pages, accepted to WASPAA 2023

  5. arXiv:2306.02778  [pdf, other

    eess.AS

    EffCRN: An Efficient Convolutional Recurrent Network for High-Performance Speech Enhancement

    Authors: Marvin Sach, Jan Franzen, Bruno Defraene, Kristoff Fluyt, Maximilian Strake, Wouter Tirry, Tim Fingscheidt

    Abstract: Fully convolutional recurrent neural networks (FCRNs) have shown state-of-the-art performance in single-channel speech enhancement. However, the number of parameters and the FLOPs/second of the original FCRN are restrictively high. A further important class of efficient networks is the CRUSE topology, serving as reference in our work. By applying a number of topological changes at once, we propose… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: 5 pages, 5 figures, accepted for Interspeech 2023

  6. arXiv:2304.09226  [pdf, other

    eess.AS cs.SD

    Coded Speech Quality Measurement by a Non-Intrusive PESQ-DNN

    Authors: Ziyi Xu, Ziyue Zhao, Tim Fingscheidt

    Abstract: Wideband codecs such as AMR-WB or EVS are widely used in (mobile) speech communication. Evaluation of coded speech quality is often performed subjectively by an absolute category rating (ACR) listening test. However, the ACR test is impractical for online monitoring of speech communication networks. Perceptual evaluation of speech quality (PESQ) is one of the widely used metrics instrumentally pre… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

  7. arXiv:2209.09735  [pdf, ps, other

    cs.LG cs.CL eess.AS eess.IV

    Relaxed Attention for Transformer Models

    Authors: Timo Lohrenz, Björn Möller, Zhengyang Li, Tim Fingscheidt

    Abstract: The powerful modeling capabilities of all-attention-based transformer architectures often cause overfitting and - for natural language processing tasks - lead to an implicitly learned internal language model in the autoregressive transformer decoder complicating the integration of external language models. In this paper, we explore relaxed attention, a simple and easy-to-implement smoothing of the… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

  8. arXiv:2205.04276  [pdf, ps, other

    eess.AS cs.SD

    Bandwidth-Scalable Fully Mask-Based Deep FCRN Acoustic Echo Cancellation and Postfiltering

    Authors: Ernst Seidel, Rasmus Kongsgaard Olsson, Karim Haddad, Zhengyang Li, Pejman Mowlaee, Tim Fingscheidt

    Abstract: Although today's speech communication systems support various bandwidths from narrowband to super-wideband and beyond, state-of-the art DNN methods for acoustic echo cancellation (AEC) are lacking modularity and bandwidth scalability. Our proposed DNN model builds upon a fully convolutional recurrent network (FCRN) and introduces scalability over various bandwidths up to a fullband (FB) system (48… ▽ More

    Submitted 7 November, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

    Comments: 5 pages, 1 figure, accepted for IWAENC 2022

  9. arXiv:2205.02085  [pdf, other

    eess.AS cs.SD

    Does a PESQNet (Loss) Require a Clean Reference Input? The Original PESQ Does, But ACR Listening Tests Don't

    Authors: Ziyi Xu, Maximilian Strake, Tim Fingscheidt

    Abstract: Perceptual evaluation of speech quality (PESQ) requires a clean speech reference as input, but predicts the results from (reference-free) absolute category rating (ACR) tests. In this work, we train a fully convolutional recurrent neural network (FCRN) as deep noise suppression (DNS) model, with either a non-intrusive or an intrusive PESQNet, where only the latter has access to a clean speech refe… ▽ More

    Submitted 13 May, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

  10. arXiv:2201.06415  [pdf, other

    cs.CV eess.IV

    Improving Performance of Semantic Segmentation CycleGANs by Noise Injection into the Latent Segmentation Space

    Authors: Jonas Löhdefink, Tim Fingscheidt

    Abstract: In recent years, semantic segmentation has taken benefit from various works in computer vision. Inspired by the very versatile CycleGAN architecture, we combine semantic segmentation with the concept of cycle consistency to enable a multitask training protocol. However, learning is largely prevented by the so-called steganography effect, which expresses itself as watermarks in the latent segmentat… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

  11. arXiv:2201.02834  [pdf, other

    eess.SP cs.LG

    Reconfigurable Intelligent Surface Enabled Spatial Multiplexing with Fully Convolutional Network

    Authors: Bile Peng, Jan-Aike Termöhlen, Cong Sun, Dan** He, Ke Guan, Tim Fingscheidt, Eduard A. Jorswieck

    Abstract: Reconfigurable intelligent surface (RIS) is an emerging technology for future wireless communication systems. In this work, we consider downlink spatial multiplexing enabled by the RIS for weighted sum-rate (WSR) maximization. In the literature, most solutions use alternating gradient-based optimization, which has moderate performance, high complexity, and limited scalability. We propose to apply… ▽ More

    Submitted 21 September, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

  12. arXiv:2111.03847  [pdf, other

    eess.AS cs.SD

    Deep Noise Suppression Maximizing Non-Differentiable PESQ Mediated by a Non-Intrusive PESQNet

    Authors: Ziyi Xu, Maximilian Strake, Tim Fingscheidt

    Abstract: Speech enhancement employing deep neural networks (DNNs) for denoising are called deep noise suppression (DNS). During training, DNS methods are typically trained with mean squared error (MSE) type loss functions, which do not guarantee good perceptual quality. Perceptual evaluation of speech quality (PESQ) is a widely used metric for evaluating speech quality. However, the original PESQ algorithm… ▽ More

    Submitted 6 November, 2021; originally announced November 2021.

  13. arXiv:2108.03051  [pdf, other

    eess.AS

    Deep Residual Echo Suppression and Noise Reduction: A Multi-Input FCRN Approach in a Hybrid Speech Enhancement System

    Authors: Jan Franzen, Tim Fingscheidt

    Abstract: Deep neural network (DNN)-based approaches to acoustic echo cancellation (AEC) and hybrid speech enhancement systems have gained increasing attention recently, introducing significant performance improvements to this research field. Using the fully convolutional recurrent network (FCRN) architecture that is among state of the art topologies for noise reduction, we present a novel deep residual ech… ▽ More

    Submitted 23 March, 2022; v1 submitted 6 August, 2021; originally announced August 2021.

    Comments: Accepted at IEEE ICASSP 2022

  14. arXiv:2107.01275  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition

    Authors: Timo Lohrenz, Patrick Schwarz, Zhengyang Li, Tim Fingscheidt

    Abstract: Recently, attention-based encoder-decoder (AED) models have shown high performance for end-to-end automatic speech recognition (ASR) across several tasks. Addressing overconfidence in such models, in this paper we introduce the concept of relaxed attention, which is a simple gradual injection of a uniform distribution to the encoder-decoder attention weights during training that is easily implemen… ▽ More

    Submitted 15 December, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: Accepted at ASRU 2021, code contributed to http://github.com/freewym/espresso

  15. arXiv:2104.00120  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition

    Authors: Timo Lohrenz, Zhengyang Li, Tim Fingscheidt

    Abstract: Stream fusion, also known as system combination, is a common technique in automatic speech recognition for traditional hybrid hidden Markov model approaches, yet mostly unexplored for modern deep neural network end-to-end model architectures. Here, we investigate various fusion techniques for the all-attention-based encoder-decoder architecture known as the transformer, striving to achieve optimal… ▽ More

    Submitted 14 July, 2021; v1 submitted 31 March, 2021; originally announced April 2021.

    Comments: accepted at INTERSPEECH 2021

  16. arXiv:2103.17189  [pdf, ps, other

    eess.AS cs.SD

    Y$^2$-Net FCRN for Acoustic Echo and Noise Suppression

    Authors: Ernst Seidel, Jan Franzen, Maximilian Strake, Tim Fingscheidt

    Abstract: In recent years, deep neural networks (DNNs) were studied as an alternative to traditional acoustic echo cancellation (AEC) algorithms. The proposed models achieved remarkable performance for the separate tasks of AEC and residual echo suppression (RES). A promising network topology is a fully convolutional recurrent network (FCRN) structure, which has already proven its performance on both noise… ▽ More

    Submitted 18 July, 2021; v1 submitted 31 March, 2021; originally announced March 2021.

    Comments: 5 pages, 2 figures, accepted for Interspeech 2021

  17. arXiv:2103.17088  [pdf, ps, other

    eess.AS

    Deep Noise Suppression With Non-Intrusive PESQNet Supervision Enabling the Use of Real Training Data

    Authors: Ziyi Xu, Maximilian Strake, Tim Fingscheidt

    Abstract: Data-driven speech enhancement employing deep neural networks (DNNs) can provide state-of-the-art performance even in the presence of non-stationary noise. During the training process, most of the speech enhancement neural networks are trained in a fully supervised way with losses requiring noisy speech to be synthesized by clean speech and additive noise. However, in a real implementation, only t… ▽ More

    Submitted 31 March, 2021; originally announced March 2021.

  18. arXiv:2103.09007  [pdf, other

    eess.AS

    AEC in a NetShell: On Target and Topology Choices for FCRN Acoustic Echo Cancellation

    Authors: Jan Franzen, Ernst Seidel, Tim Fingscheidt

    Abstract: Acoustic echo cancellation (AEC) algorithms have a long-term steady role in signal processing, with approaches improving the performance of applications such as automotive hands-free systems, smart home and loudspeaker devices, or web conference systems. Just recently, very first deep neural network (DNN)-based approaches were proposed with a DNN for joint AEC and residual echo suppression (RES)/n… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

    Comments: Accepted at IEEE ICASSP 2021

  19. An Application-Driven Conceptualization of Corner Cases for Perception in Highly Automated Driving

    Authors: Florian Heidecker, Jasmin Breitenstein, Kevin Rösch, Jonas Löhdefink, Maarten Bieshaar, Christoph Stiller, Tim Fingscheidt, Bernhard Sick

    Abstract: Systems and functions that rely on machine learning (ML) are the basis of highly automated driving. An essential task of such ML models is to reliably detect and interpret unusual, new, and potentially dangerous situations. The detection of those situations, which we refer to as corner cases, is highly relevant for successfully develo**, applying, and validating automotive perception functions i… ▽ More

    Submitted 5 March, 2021; originally announced March 2021.

    Comments: This paper is submitted to IEEE Intelligent Vehicles Symposium 2021

  20. arXiv:2012.01558  [pdf, other

    cs.CV cs.LG eess.IV

    From a Fourier-Domain Perspective on Adversarial Examples to a Wiener Filter Defense for Semantic Segmentation

    Authors: Nikhil Kapoor, Andreas Bär, Serin Varghese, Jan David Schneider, Fabian Hüger, Peter Schlicht, Tim Fingscheidt

    Abstract: Despite recent advancements, deep neural networks are not robust against adversarial perturbations. Many of the proposed adversarial defense approaches use computationally expensive training mechanisms that do not scale to complex real-world tasks such as semantic segmentation, and offer only marginal improvements. In addition, fundamental questions on the nature of adversarial perturbations and t… ▽ More

    Submitted 21 April, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: Accepted by The International Joint Conference on Neural Network (IJCNN) 2021

  21. arXiv:2005.06050  [pdf, other

    cs.CV cs.LG eess.IV

    Class-Incremental Learning for Semantic Segmentation Re-Using Neither Old Data Nor Old Labels

    Authors: Marvin Klingner, Andreas Bär, Philipp Donn, Tim Fingscheidt

    Abstract: While neural networks trained for semantic segmentation are essential for perception in autonomous driving, most current algorithms assume a fixed number of classes, presenting a major limitation when develo** new autonomous driving systems with the need of additional classes. In this paper we present a technique implementing class-incremental learning for semantic segmentation without using the… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

    Comments: ITSC 2020 Conference Paper

  22. arXiv:1908.05087  [pdf, ps, other

    eess.AS cs.SD

    Components Loss for Neural Networks in Mask-Based Speech Enhancement

    Authors: Ziyi Xu, Samy Elshamy, Ziyue Zhao, Tim Fingscheidt

    Abstract: Estimating time-frequency domain masks for single-channel speech enhancement using deep learning methods has recently become a popular research field with promising results. In this paper, we propose a novel components loss (CL) for the training of neural networks for mask-based speech enhancement. During the training process, the proposed CL offers separate control over preservation of the speech… ▽ More

    Submitted 14 August, 2019; originally announced August 2019.

  23. arXiv:1905.09754  [pdf, ps, other

    eess.AS cs.SD

    A Perceptual Weighting Filter Loss for DNN Training in Speech Enhancement

    Authors: Ziyue Zhao, Samy Elshamy, Tim Fingscheidt

    Abstract: Single-channel speech enhancement with deep neural networks (DNNs) has shown promising performance and is thus intensively being studied. In this paper, instead of applying the mean squared error (MSE) as the loss function during DNN training for speech enhancement, we design a perceptual weighting filter loss motivated by the weighting filter as it is employed in analysis-by-synthesis speech codi… ▽ More

    Submitted 18 August, 2019; v1 submitted 23 May, 2019; originally announced May 2019.

  24. arXiv:1810.11217  [pdf, ps, other

    eess.AS

    Concatenated Identical DNN (CI-DNN) to Reduce Noise-Type Dependence in DNN-Based Speech Enhancement

    Authors: Ziyi Xu, Maximilian Strake, Tim Fingscheidt

    Abstract: Estimating time-frequency domain masks for speech enhancement using deep learning approaches has recently become a popular field of research. In this paper, we propose a mask-based speech enhancement framework by using concatenated identical deep neural networks (CI-DNNs). The idea is that a single DNN is trained under multiple input and output signal-to-noise power ratio (SNR) conditions, using t… ▽ More

    Submitted 26 October, 2018; originally announced October 2018.

  25. arXiv:1806.09411  [pdf, ps, other

    eess.AS cs.SD

    Convolutional Neural Networks to Enhance Coded Speech

    Authors: Ziyue Zhao, Huijun Liu, Tim Fingscheidt

    Abstract: Enhancing coded speech suffering from far-end acoustic background noise, quantization noise, and potentially transmission errors, is a challenging task. In this work we propose two postprocessing approaches applying convolutional neural networks (CNNs) either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs. The time domain approach follo… ▽ More

    Submitted 24 January, 2019; v1 submitted 25 June, 2018; originally announced June 2018.

    Comments: More analysis are added for version 4