Skip to main content

Showing 1–18 of 18 results for author: Welker, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.06185  [pdf, other

    eess.AS cs.LG cs.SD

    EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

    Authors: Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo Gerkmann

    Abstract: We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various m… ▽ More

    Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2406.03460  [pdf, other

    eess.AS cs.LG cs.SD

    The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement

    Authors: Danilo de Oliveira, Simon Welker, Julius Richter, Timo Gerkmann

    Abstract: To obtain improved speech enhancement models, researchers often focus on increasing performance according to specific instrumental metrics. However, when the same metric is used in a loss function to optimize models, it may be detrimental to aspects that the given metric does not see. The goal of this paper is to illustrate the risk of overfitting a speech enhancement model to the metric used for… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  3. arXiv:2405.04272  [pdf, other

    eess.AS cs.LG cs.SD

    BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models

    Authors: Eloi Moliner, Jean-Marie Lemercier, Simon Welker, Timo Gerkmann, Vesa Välimäki

    Abstract: In this paper, we present an unsupervised single-channel method for joint blind dereverberation and room impulse response estimation, based on posterior sampling with diffusion models. We parameterize the reverberation operator using a filter with exponential decay for each frequency subband, and iteratively estimate the corresponding parameters as the speech utterance gets refined along the rever… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Submitted to IWAENC 2024

  4. arXiv:2402.09821  [pdf, other

    eess.AS cs.LG cs.SD

    Diffusion Models for Audio Restoration

    Authors: Jean-Marie Lemercier, Julius Richter, Simon Welker, Eloi Moliner, Vesa Välimäki, Timo Gerkmann

    Abstract: With the development of audio playback devices and fast data transmission, the demand for high sound quality is rising, for both entertainment and communications. In this quest for better sound quality, challenges emerge from distortions and interferences originating at the recording side or caused by an imperfect transmission pipeline. To address this problem, audio restoration methods aim to rec… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Full paper invited to the IEEE Signal Processing Magazine Special Issue "Model-based and Data-Driven Audio Signal Processing"

  5. arXiv:2309.08639  [pdf, other

    eess.IV eess.SP physics.comp-ph physics.optics

    Live Iterative Ptychography with projection-based algorithms

    Authors: Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann

    Abstract: In this work, we demonstrate that the ptychographic phase problem can be solved in a live fashion during scanning, while data is still being collected. We propose a generally applicable modification of the widespread projection-based algorithms such as Error Reduction (ER) and Difference Map (DM). This novel variant of ptychographic phase retrieval enables immediate visual feedback during experime… ▽ More

    Submitted 19 September, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 24

  6. arXiv:2309.07828  [pdf, other

    eess.AS cs.SD eess.SP

    EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data

    Authors: Navin Raj Prabhu, Bunlong Lay, Simon Welker, Nale Lehmann-Willenbrock, Timo Gerkmann

    Abstract: Speech emotion conversion is the task of converting the expressed emotion of a spoken utterance to a target emotion while preserving the lexical content and speaker identity. While most existing works in speech emotion conversion rely on acted-out datasets and parallel data samples, in this work we specifically focus on more challenging in-the-wild scenarios and do not rely on parallel data. To th… ▽ More

    Submitted 8 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024

  7. arXiv:2309.07043  [pdf, other

    eess.AS cs.SD eess.SP

    A Flexible Online Framework for Projection-Based STFT Phase Retrieval

    Authors: Tal Peer, Simon Welker, Johannes Kolhoff, Timo Gerkmann

    Abstract: Several recent contributions in the field of iterative STFT phase retrieval have demonstrated that the performance of the classical Griffin-Lim method can be considerably improved upon. By using the same projection operators as Griffin-Lim, but combining them in innovative ways, these approaches achieve better results in terms of both reconstruction quality and required number of iterations, while… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 24

  8. arXiv:2306.12286  [pdf, other

    eess.AS cs.LG cs.SD

    Diffusion Posterior Sampling for Informed Single-Channel Dereverberation

    Authors: Jean-Marie Lemercier, Simon Welker, Timo Gerkmann

    Abstract: We present in this paper an informed single-channel dereverberation method based on conditional generation with diffusion models. With knowledge of the room impulse response, the anechoic utterance is generated via reverse diffusion using a measurement consistency criterion coupled with a neural network that represents the clean speech prior. The proposed approach is largely more robust to measure… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  9. arXiv:2303.08674  [pdf, other

    eess.AS cs.SD

    Speech Signal Improvement Using Causal Generative Diffusion Models

    Authors: Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Tal Peer, Timo Gerkmann

    Abstract: In this paper, we present a causal speech signal improvement system that is designed to handle different types of distortions. The method is based on a generative diffusion model which has been shown to work well in scenarios with missing data and non-linear corruptions. To guarantee causal processing, we modify the network architecture of our previous work and replace global normalization with ca… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  10. arXiv:2302.14748  [pdf, other

    eess.AS cs.LG cs.SD

    Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement

    Authors: Bunlong Lay, Simon Welker, Julius Richter, Timo Gerkmann

    Abstract: Recently, score-based generative models have been successfully employed for the task of speech enhancement. A stochastic differential equation is used to model the iterative forward process, where at each step environmental noise and white Gaussian noise are added to the clean speech signal. While in limit the mean of the forward process ends at the noisy mixture, in practice it stops earlier and… ▽ More

    Submitted 30 May, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: 5 pages, 2 figures, Accepted to Interspeech 20223

  11. arXiv:2212.11851  [pdf, other

    eess.AS cs.LG cs.SD

    StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

    Authors: Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann

    Abstract: Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive corruption types or when they are evaluated on mismatched conditions. However, diffusion models suffer from a high computational burden, mainly as they require to r… ▽ More

    Submitted 12 March, 2024; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech and Language Processing, 2023

  12. arXiv:2211.06757  [pdf, other

    eess.IV cs.CV cs.LG

    DriftRec: Adapting diffusion models to blind JPEG restoration

    Authors: Simon Welker, Henry N. Chapman, Timo Gerkmann

    Abstract: In this work, we utilize the high-fidelity generation abilities of diffusion models to solve blind JPEG restoration at high compression levels. We propose an elegant modification of the forward stochastic differential equation of diffusion models to adapt them to this restoration task and name our method DriftRec. Comparing DriftRec against an $L_2$ regression baseline with the same network archit… ▽ More

    Submitted 3 April, 2024; v1 submitted 12 November, 2022; originally announced November 2022.

    Comments: (C) 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Report number: pp. 2795 - 2807

    Journal ref: IEEE Transactions on Image Processing, Vol. 33, 2024

  13. arXiv:2211.04332  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    DiffPhase: Generative Diffusion-based STFT Phase Retrieval

    Authors: Tal Peer, Simon Welker, Timo Gerkmann

    Abstract: Diffusion probabilistic models have been recently used in a variety of tasks, including speech enhancement and synthesis. As a generative approach, diffusion models have been shown to be especially suitable for imputation problems, where missing data is generated based on existing data. Phase retrieval is inherently an imputation problem, where phase information has to be generated based on the gi… ▽ More

    Submitted 2 June, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: Accepted by ICASSP 2023

    Journal ref: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  14. arXiv:2211.02397  [pdf, other

    eess.AS cs.LG cs.SD

    Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration

    Authors: Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann

    Abstract: Diffusion-based generative models have had a high impact on the computer vision and speech processing communities these past years. Besides data generation tasks, they have also been employed for data restoration tasks like speech enhancement and dereverberation. While discriminative models have traditionally been argued to be more powerful e.g. for speech enhancement, generative diffusion approac… ▽ More

    Submitted 16 March, 2023; v1 submitted 4 November, 2022; originally announced November 2022.

    Comments: \c{opyright} 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing

  15. arXiv:2208.05830  [pdf, other

    eess.AS cs.LG cs.SD

    Speech Enhancement and Dereverberation with Diffusion-based Generative Models

    Authors: Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Timo Gerkmann

    Abstract: In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve into an extensive theoretical examination of its implications. Opposed to usual conditional generation tasks, we do not start the reverse process from pure Gaussia… ▽ More

    Submitted 13 June, 2023; v1 submitted 11 August, 2022; originally announced August 2022.

    Comments: Accepted version

  16. Beyond Griffin-Lim: Improved Iterative Phase Retrieval for Speech

    Authors: Tal Peer, Simon Welker, Timo Gerkmann

    Abstract: Phase retrieval is a problem encountered not only in speech and audio processing, but in many other fields such as optics. Iterative algorithms based on non-convex set projections are effective and frequently used for retrieving the phase when only STFT magnitudes are available. While the basic Griffin-Lim algorithm and its variants have been the prevalent method for decades, more recent advances,… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Submitted to IWAENC 2022

  17. arXiv:2203.17004  [pdf, other

    eess.AS cs.LG cs.SD

    Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain

    Authors: Simon Welker, Julius Richter, Timo Gerkmann

    Abstract: Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals. In this work, we extend these models to the complex short-time Fourier transform (STFT) domain, proposing a novel training task for speech enhancement using a complex-valued deep neural network. We deriv… ▽ More

    Submitted 7 July, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by Interspeech 2022

  18. arXiv:2202.10573  [pdf, other

    eess.IV cs.LG eess.AS eess.SP

    Deep Iterative Phase Retrieval for Ptychography

    Authors: Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann

    Abstract: One of the most prominent challenges in the field of diffractive imaging is the phase retrieval (PR) problem: In order to reconstruct an object from its diffraction pattern, the inverse Fourier transform must be computed. This is only possible given the full complex-valued diffraction data, i.e. magnitude and phase. However, in diffractive imaging, generally only magnitudes can be directly measure… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)