Skip to main content

Showing 1–2 of 2 results for author: Charon, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2211.05071  [pdf, other

    eess.AS cs.SD

    A Diffeomorphic Flow-based Variational Framework for Multi-speaker Emotion Conversion

    Authors: Ravi Shankar, Hsi-Wei Hsieh, Nicolas Charon, Archana Venkataraman

    Abstract: This paper introduces a new framework for non-parallel emotion conversion in speech. Our framework is based on two key contributions. First, we propose a stochastic version of the popular CycleGAN model. Our modified loss function introduces a Kullback Leibler (KL) divergence term that aligns the source and target data distributions learned by the generators, thus overcoming the limitations of sam… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

    Comments: Accepted in IEEE Transactions on Audio, Speech and Language Processing

  2. arXiv:2007.12937  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-speaker Emotion Conversion via Latent Variable Regularization and a Chained Encoder-Decoder-Predictor Network

    Authors: Ravi Shankar, Hsi-Wei Hsieh, Nicolas Charon, Archana Venkataraman

    Abstract: We propose a novel method for emotion conversion in speech based on a chained encoder-decoder-predictor neural network architecture. The encoder constructs a latent embedding of the fundamental frequency (F0) contour and the spectrum, which we regularize using the Large Diffeomorphic Metric Map** (LDDMM) registration framework. The decoder uses this embedding to predict the modified F0 contour i… ▽ More

    Submitted 10 August, 2020; v1 submitted 25 July, 2020; originally announced July 2020.

    Comments: Paper Accepted in Interspeech 2020