Skip to main content

Showing 1–9 of 9 results for author: Lay, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.06185  [pdf, other

    eess.AS cs.LG cs.SD

    EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

    Authors: Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo Gerkmann

    Abstract: We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various m… ▽ More

    Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2402.00811  [pdf, other

    eess.AS cs.LG cs.SD

    An Analysis of the Variance of Diffusion-based Speech Enhancement

    Authors: Bunlong Lay, Timo Gerkmann

    Abstract: Diffusion models proved to be powerful models for generative speech enhancement. In recent SGMSE+ approaches, training involves a stochastic differential equation for the diffusion process, adding both Gaussian and environmental noise to the clean speech signal gradually. The speech enhancement performance varies depending on the choice of the stochastic differential equation that controls the evo… ▽ More

    Submitted 13 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: 5 pages, 3 figures, 1 table

  3. arXiv:2309.09677  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Single and Few-step Diffusion for Generative Speech Enhancement

    Authors: Bunlong Lay, Jean-Marie Lemercier, Julius Richter, Timo Gerkmann

    Abstract: Diffusion models have shown promising results in speech enhancement, using a task-adapted diffusion process for the conditional generation of clean speech given a noisy mixture. However, at test time, the neural network used for score estimation is called multiple times to solve the iterative reverse process. This results in a slow inference process and causes discretization errors that accumulate… ▽ More

    Submitted 15 January, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: copyright 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  4. arXiv:2309.07828  [pdf, other

    eess.AS cs.SD eess.SP

    EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data

    Authors: Navin Raj Prabhu, Bunlong Lay, Simon Welker, Nale Lehmann-Willenbrock, Timo Gerkmann

    Abstract: Speech emotion conversion is the task of converting the expressed emotion of a spoken utterance to a target emotion while preserving the lexical content and speaker identity. While most existing works in speech emotion conversion rely on acted-out datasets and parallel data samples, in this work we specifically focus on more challenging in-the-wild scenarios and do not rely on parallel data. To th… ▽ More

    Submitted 8 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024

  5. arXiv:2303.08674  [pdf, other

    eess.AS cs.SD

    Speech Signal Improvement Using Causal Generative Diffusion Models

    Authors: Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Tal Peer, Timo Gerkmann

    Abstract: In this paper, we present a causal speech signal improvement system that is designed to handle different types of distortions. The method is based on a generative diffusion model which has been shown to work well in scenarios with missing data and non-linear corruptions. To guarantee causal processing, we modify the network architecture of our previous work and replace global normalization with ca… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  6. arXiv:2302.14748  [pdf, other

    eess.AS cs.LG cs.SD

    Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement

    Authors: Bunlong Lay, Simon Welker, Julius Richter, Timo Gerkmann

    Abstract: Recently, score-based generative models have been successfully employed for the task of speech enhancement. A stochastic differential equation is used to model the iterative forward process, where at each step environmental noise and white Gaussian noise are added to the clean speech signal. While in limit the mean of the forward process ends at the noisy mixture, in practice it stops earlier and… ▽ More

    Submitted 30 May, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: 5 pages, 2 figures, Accepted to Interspeech 20223

  7. arXiv:2208.05830  [pdf, other

    eess.AS cs.LG cs.SD

    Speech Enhancement and Dereverberation with Diffusion-based Generative Models

    Authors: Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Timo Gerkmann

    Abstract: In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve into an extensive theoretical examination of its implications. Opposed to usual conditional generation tasks, we do not start the reverse process from pure Gaussia… ▽ More

    Submitted 13 June, 2023; v1 submitted 11 August, 2022; originally announced August 2022.

    Comments: Accepted version

  8. arXiv:1906.11875  [pdf

    eess.IV cs.CV

    Instant automatic diagnosis of diabetic retinopathy

    Authors: Gwenolé Quellec, Mathieu Lamard, Bruno Lay, Alexandre Le Guilcher, Ali Erginay, Béatrice Cochener, Pascale Massin

    Abstract: The purpose of this study is to evaluate the performance of the OphtAI system for the automatic detection of referable diabetic retinopathy (DR) and the automatic assessment of DR severity using color fundus photography. OphtAI relies on ensembles of convolutional neural networks trained to recognize eye laterality, detect referable DR and assess DR severity. The system can either process single i… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

  9. Dealing with Topological Information within a Fully Convolutional Neural Network

    Authors: Etienne Decencière, Santiago Velasco-Forero, Fu Min, Juanjuan Chen, Hélène Burdin, Gervais Gauthier, Bruno Laÿ, Thomas Bornschloegl, Thérèse Baldeweck

    Abstract: A fully convolutional neural network has a receptive field of limited size and therefore cannot exploit global information, such as topological information. A solution is proposed in this paper to solve this problem, based on pre-processing with a geodesic operator. It is applied to the segmentation of histological images of pigmented reconstructed epidermis acquired via Whole Slide Imaging.

    Submitted 27 June, 2019; originally announced June 2019.

    Comments: International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS 2018)

    Journal ref: Advanced Concepts for Intelligent Vision Systems. ACIVS 2018. Lecture Notes in Computer Science, vol 11182. Springer, Cham