Skip to main content

Showing 1–5 of 5 results for author: Carbonneau, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2311.08667  [pdf, other

    cs.SD eess.AS

    EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis

    Authors: Ge Zhu, Yutong Wen, Marc-André Carbonneau, Zhiyao Duan

    Abstract: Audio diffusion models can synthesize a wide variety of sounds. Existing models often operate on the latent domain with cascaded phase recovery modules to reconstruct waveform. This poses challenges when generating high-fidelity audio. In this paper, we propose EDMSound, a diffusion-based generative model in spectrogram domain under the framework of elucidated diffusion models (EDM). Combining wit… ▽ More

    Submitted 18 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted at NeurIPS Workshop: Machine Learning for Audio (Camera Ready)

  2. arXiv:2307.06040  [pdf, other

    eess.AS cs.LG cs.SD

    Rhythm Modeling for Voice Conversion

    Authors: Benjamin van Niekerk, Marc-André Carbonneau, Herman Kamper

    Abstract: Voice conversion aims to transform source speech into a different target voice. However, typical voice conversion systems do not account for rhythm, which is an important factor in the perception of speaker identity. To bridge this gap, we introduce Urhythmic-an unsupervised method for rhythm conversion that does not require parallel data or text transcriptions. Using self-supervised representatio… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: 5 pages, 4 figures, 4 tables, submitted to IEEE Signal Processing Letters

  3. A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion

    Authors: Benjamin van Niekerk, Marc-André Carbonneau, Julian Zaïdi, Mathew Baas, Hugo Seuté, Herman Kamper

    Abstract: The goal of voice conversion is to transform source speech into a target voice, kee** the content unchanged. In this paper, we focus on self-supervised representation learning for voice conversion. Specifically, we compare discrete and soft speech units as input features. We find that discrete representations effectively remove speaker information but discard some linguistic content - leading to… ▽ More

    Submitted 8 June, 2022; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: 5 pages, 2 figures, 2 tables. Accepted at ICASSP 2022

  4. Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis

    Authors: Julian Zaïdi, Hugo Seuté, Benjamin van Niekerk, Marc-André Carbonneau

    Abstract: This paper presents Daft-Exprt, a multi-speaker acoustic model advancing the state-of-the-art for cross-speaker prosody transfer on any text. This is one of the most challenging, and rarely directly addressed, task in speech synthesis, especially for highly expressive data. Daft-Exprt uses FiLM conditioning layers to strategically inject different prosodic information in all parts of the architect… ▽ More

    Submitted 5 April, 2022; v1 submitted 4 August, 2021; originally announced August 2021.

    Comments: Submitted to Interspeech 2022, 5 pages, 5 figures, 2 tables

    Journal ref: Proc. Interspeech (2022) 4591-4595

  5. arXiv:2103.12177  [pdf, other

    cs.LG eess.SP

    Energy Disaggregation using Variational Autoencoders

    Authors: Antoine Langevin, Marc-André Carbonneau, Mohamed Cheriet, Ghyslain Gagnon

    Abstract: Non-intrusive load monitoring (NILM) is a technique that uses a single sensor to measure the total power consumption of a building. Using an energy disaggregation method, the consumption of individual appliances can be estimated from the aggregate measurement. Recent disaggregation algorithms have significantly improved the performance of NILM systems. However, the generalization capability of the… ▽ More

    Submitted 19 July, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: 13 pages, 2 figures, results for the REFIT dataset added